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Abstract 


As  probabilistic  computations  play  an  increasing  role  in  solving  various  problems,  researchers  have  designed 
probabilistic  languages  to  facilitate  their  modeling.  Most  of  the  existing  probabilistic  languages,  however, 
focus  only  on  discrete  distributions,  and  there  has  been  little  effort  to  develop  probabilistic  languages  whose 
expressive  power  is  beyond  discrete  distributions.  This  dissertation  presents  a  probabilistic  language,  called 
PTP  (Probabilisric  Programming),  which  supports  all  kinds  of  probability  distributions. 

The  key  idea  behind  PTP  is  to  use  sampling  functions,  i.e.,  mappings  from  the  unit  interval  (0.0, 1.0]  to 
probability  domains,  to  specify  probability  distributions.  By  using  sampling  functions  as  its  mathematical 
basis,  PTP  provides  a  unified  representation  scheme  for  probability  distributions,  without  drawing  a  syntactic 
or  semantic  distinction  between  different  kinds  of  probability  distributions. 

Independently  of  PTP,  we  develop  a  linguistic  framework,  called  Xq,  to  account  for  computational 
effects  in  general.  Aq  extends  a  monadic  language  by  applying  the  possible  world  interpretation  of  modal 
logic.  A  characteristic  feature  of  Aq  is  the  distinction  between  stateful  computational  effects,  called  world 
ejfects,  and  contextual  computational  effects,  called  control  effects.  PTP  arises  as  an  instance  of  Aq  with  a 
language  construct  for  probabilistic  choices. 

We  use  a  sound  and  complete  translator  of  PTP  to  embed  it  in  Objective  CAML.  The  use  of  PTP  is 
demonstrated  with  three  applications  in  robotics:  robot  localization,  people  tracking,  and  robotic  mapping. 
Thus  PTP  serves  as  another  example  of  high-level  language  applied  to  a  problem  domain  where  imperative 
languages  have  been  traditionally  dominant. 
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Chapter  1 

Introduction 


This  dissertation  describes  the  design,  implementation,  and  applications  of  a  probabilistic  language  called 
PTP  (Probabilisric  Programming).  PTP  uses  sampling  functions,  i.e.,  mappings  from  the  unit  interval 
(0.0, 1.0]  to  probability  domains,  to  specify  probability  distributions.  By  using  sampling  functions  in  spec¬ 
ifying  probability  distributions,  PTP  supports  all  kinds  of  probability  distributions  in  a  uniform  manner. 
The  use  of  PTP  is  demonstrated  with  three  applications  in  robotics:  robot  localization,  people  tracking,  and 
robotic  mapping. 

The  contribution  of  this  dissertation  is  three-fold: 

•  Sampling  functions  for  specifying  probability  distributions.  As  most  of  the  existing  probabilistic  lan¬ 
guages  focus  only  on  discrete  distributions,  probabilistic  computations  involving  non-discrete  distri¬ 
butions  have  usually  been  implemented  in  conventional  languages.  Sampling  functions  open  a  new 
way  to  specify  all  kinds  of  probability  distributions,  and  thus  serve  as  a  mathematical  basis  for  prob¬ 
abilistic  languages  whose  expressive  power  is  beyond  discrete  distributions. 

•  Linguistic  framework  for  computational  effects.  We  develop  a  new  linguistic  framework,  called  Ao, 
to  account  for  computational  effects  in  general.  Aq  extends  the  monadic  language  of  Pfenning  and 
Davies  [60]  by  applying  the  possible  world  interpretation  of  modal  logic.  It  distinguishes  between 
stateful  computational  effects  (called  world  effects)  and  contextual  computational  effects  (called  con¬ 
trol  effects),  and  provides  a  different  view  on  how  to  combine  computational  effects  at  the  language 
design  level.  PTP  arises  as  an  instance  of  Ao  with  a  language  construct  for  probabilistic  choices. 

•  Applications  of  PTP  in  robotics.  In  order  to  execute  PTP  programs,  we  use  a  sound  and  complete 
translator  of  PTP  to  embed  it  in  Objective  CAML.  The  use  of  PTP  is  then  demonstrated  with  three 
applications  in  robotics:  robot  localization,  people  tracking,  and  robotic  mapping.  Thus  PTP  serves 
as  another  example  of  high-level  language  applied  to  a  problem  domain  where  imperative  languages 
have  been  traditionally  dominant. 


1.1  Motivation 

A  probabilistic  computation  is  a  computation  which  makes  probabilistic  choices  or  whose  result  is  repre¬ 
sented  with  probability  distributions.  As  an  alternative  paradigm  to  deterministic  computation,  it  has  been 
used  successfully  in  diverse  fields  of  computer  science  such  as  speech  recognition  [63,  29],  natural  language 
processing  [11],  and  robotics  [72].  Its  success  lies  in  the  fact  that  probabilistic  approaches  often  overcome 
the  practical  limitation  of  deterministic  approaches.  A  trivial  example  is  the  problem  of  testing  whether 
a  multivariate  polynomial  given  by  a  program  without  branch  statements  is  identically  zero  or  not.  It  is 
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difficult  to  find  a  practical  deterministic  solution,  but  there  is  a  simple  probabilistic  solution:  evaluate  the 
polynomial  on  a  randomly  chosen  input  and  check  if  the  result  is  zero. 

As  probabilistic  computations  play  an  increasing  role  in  solving  various  problems,  researchers  have 
also  designed  probabilistic  languages  to  facilitate  their  implementation  [33,  24,  74,  59,  64,  43,  53].  A 
probabilistic  language  treats  probability  distributions  as  built-in  datatypes  and  thus  abstracts  from  represen¬ 
tation  schemes,  i.e.,  data  structures  for  representing  probability  distributions.  For  example,  a  conventional 
language  may  be  extended  with  an  abstract  datatype  for  probability  distributions,  which  is  specified  by  a 
certain  choice  of  representation  scheme  and  a  set  of  operations  on  probability  distributions.  As  a  result, 
it  allows  programmers  to  concentrate  on  how  to  formulate  probabilistic  computations  at  the  level  of  prob¬ 
ability  distributions  rather  than  representation  schemes.  When  translated  in  a  probabilistic  language  (by 
programmers),  such  a  formulation  usually  produces  concise  and  elegant  code. 

A  typical  probabilistic  language  supports  at  least  discrete  distributions,  for  which  there  exists  a  represen¬ 
tation  scheme  sufficient  for  all  practical  purposes:  a  set  of  pairs  consisting  of  a  value  from  the  probability 
domain  and  its  probability.  We  can  use  such  a  probabilistic  language  for  those  problems  involving  only 
discrete  distributions.  If  non-discrete  distributions  are  involved,  however,  we  usually  use  a  conventional 
language  for  the  sake  of  efficiency,  assuming  a  specific  kind  of  probability  distributions  (e.g.,  Gaussian 
distributions)  or  choosing  a  specific  representation  scheme  (e.g.,  a  set  of  samples  from  the  probability  dis¬ 
tribution).  For  this  reason,  there  has  been  little  effort  to  develop  probabilistic  languages  whose  expressive 
power  is  beyond  discrete  distributions. 

The  unavailability  of  such  probabilistic  languages  means  that  when  implementing  a  probabilistic  com¬ 
putation  involving  non-discrete  distributions,  we  have  to  resort  to  a  conventional  language.  Thus  we  wish  to 
develop  a  probabilistic  language  supporting  all  kinds  of  probability  distributions  —  discrete  distributions, 
continuous  distributions,  and  even  those  belonging  to  neither  group.  Furthermore  we  wish  to  draw  no  dis¬ 
tinction  between  different  kinds  of  probability  distributions,  both  syntactically  and  semantically,  so  that  we 
can  achieve  a  uniform  framework  for  probabilistic  computation.  Such  a  probabilistic  language  can  have  a 
significant  practical  impact,  since  once  formulated  at  the  level  of  probability  distributions,  any  probabilistic 
computation  can  be  directly  translated  into  code. 

Below  we  present  an  example  that  illustrates  the  disadvantage  of  conventional  languages  in  implement¬ 
ing  probabilistic  computations  and  also  motivates  the  development  of  FTP. 


Notation 

If  a  variable  x  ranges  over  the  domain  of  a  probability  distribution  P,  then  P(x)  means,  depending  on  the 
context,  either  the  probability  distribution  itself  (as  in  “probability  distribution  P{x)”)  or  the  probability  of 
a  particular  value  x  (as  in  “probability  P{x)”).  We  write  P{x)  for  probability  distribution  P  when  we  want 
to  emphasize  the  use  of  variable  x.  If  we  do  not  need  a  specific  name  for  a  probability  distribution,  we  use 
Proh  (as  in  “probability  distribution  Prob{x)”). 

Similarly  P{x\y)  means  either  the  conditional  probability  P  itself  or  the  probability  of  x  conditioned  on 
y.  We  write  Py  or  P(-\y)  for  the  probability  distribution  conditioned  on  y. 

(7(0.0, 1.0]  denotes  a  uniform  distribution  over  the  unit  interval  (0.0,  l.Oj. 


A  motivating  example  for  PTP 

A  Bayes  filter  [28]  is  a  popular  solution  to  a  wide  range  of  state  estimation  problems.  It  estimates  the  state 
s  of  a  system  from  a  sequence  of  actions  and  measurements,  where  an  action  a  induces  a  change  to  the 
state  and  a  measurement  m  gives  information  on  the  state.  At  its  core,  a  Bayes  filter  computes  a  probability 
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distribution  Bel{s)  of  the  state  aeeording  to  the  following  update  equations: 

(1.1)  Bel{s)  <—  j A{s\a,  s')Bel{s  )ds' 

(1.2)  Bel{s)  ^  riV{m\s)Bel{s) 

^(s|a,  s')  is  the  probability  that  the  system  transitions  to  state  s  after  taking  aetion  a  in  another  state  s' , 
V{m\s)  the  probability  of  measurement  m  in  state  s,  and  tj  a  normalizing  eonstant  ensuring  f  Bel{s)ds  = 
1.0.  The  update  equations  are  formulated  at  the  level  of  probability  distributions  in  the  sense  that  they  do 
not  assume  a  partieular  representation  seheme. 

Unfortunately  the  update  equations  are  diffieult  to  implement  for  arbitrary  probability  distributions. 
When  it  eomes  to  implementation,  therefore,  we  usually  simplify  the  update  equations  by  making  additional 
assumptions  on  the  system  or  ehoosing  a  speeifie  representation  seheme.  For  example,  with  the  assumption 
that  Bel  is  a  Gaussian  distribution,  we  obtain  a  variant  of  the  Bayes  filter  ealled  a  Kalman  filter  [79].  If  Bel 
is  approximated  with  a  set  of  samples,  we  obtain  another  variant  ealled  a  particle  filter  [15]. 

Even  these  variants  of  the  Bayes  filter  are,  however,  not  trivial  to  implement  in  eonventional  languages. 
For  example,  a  Kalman  filter  requires  various  matrix  operations  ineluding  matrix  inversion.  A  partiele 
filter  manipulates  weights  assoeiated  with  individual  samples,  whieh  often  results  in  eomplieated  eode. 
Sinee  eonventional  languages  ean  only  simulate  probability  distributions,  it  is  also  diffieult  to  figure  ouf  fhe 
infended  meaning  of  fhe  eode,  namely  fhe  updafe  equafions  for  fhe  Bayes  filler. 

An  allernalive  approaeh  is  lo  use  an  exisling  probabilislie  language  afler  diserelizing  all  probabilify 
disfribufions.  This  idea  is  appealing  in  fheory,  buf  impraefieal  for  fwo  reasons.  Firsf,  given  a  probabilify 
dislribulion,  if  may  nof  be  easy  lo  ehoose  an  appropriale  subsel  of  ils  supporl  upon  whieh  diserelizafion  is 
performed.  For  example,  in  order  lo  diserefize  a  Gaussian  dislribulion  (whose  supporl  is  (— oo,  oo)),  we  need 
lo  ehoose  a  Ihreshold  for  probabililies  so  lhal  diserelizafion  is  eonfined  lo  an  interval  of  finile  lenglh;  for  an 
arbilrary  probabilify  dislribulion,  sueh  a  Ihreshold  ean  be  eompuled  only  by  examining  ils  entire  probabilify 
domain.  Even  when  Ihe  subsel  of  ils  supporl  is  fixed  in  advanee,  Ihe  proeess  of  diserelizafion  may  ineur 
a  eonsiderable  amounl  of  programming.  For  example.  Fox  et  al.  [20]  develop  Iwo  non-lrivial  leehniques 
(speeifie  lo  Iheir  appliealions)  for  Ihe  sole  purpose  of  effieienlly  manipulating  diserelized  probabilily  dislri- 
bulions.  Seeond  some  probabilily  dislribulions  eannol  be  diserelized  in  any  meaningful  way.  An  example 
is  probabilily  dislribulions  over  probabilily  dislribulions  or  funelions,  whieh  do  oeeur  in  real  appliealions 
(Chapter  5  presenls  sueh  an  example). 

If  Ihere  were  a  probabilislie  language  supporting  all  kinds  of  probabilily  dislribulions,  we  eould  imple- 
menl  Ihe  update  equafions  wilh  mueh  less  effort  FTP  is  a  probabilislie  language  designed  wilh  Ihese  goals 
in  mind. 


1.2  Previous  work 

There  are  a  number  of  probabilislie  languages  lhal  foeus  on  diserele  dislribulions.  Sueh  a  language  usually 
provides  a  probabilislie  eonslruel  lhal  is  equivalenl  lo  a  binary  ehoiee  eonslruel.  Saheb-Djahromi  [69] 
presenls  a  probabilislie  language  wilh  a  binary  ehoiee  eonslruel  (pi  ^ei,p2— >62)  where  pi  +  P2  =  1.0.^ 
Koller,  MeAllesler,  and  Pfeffer  [33]  presenl  a  firsl  order  funelional  language  wilh  a  eoin  loss  eonslruel  flip(p) 
where  p  is  a  probabilily  in  (0.0, 1.0).  Pfeffer  [59]  generalizes  Ihe  eoin  loss  eonslruel  lo  a  multiple  ehoiee 
eonslruel  dist  [pi  :  ei,  •  •  •  ,Pn  '■  ^n]  where  J2iPi  —  Gupla,  Jagadeesan,  and  Panangaden  [24]  presenl 
a  sloehaslie  eoneurrenl  eonslrainl  language  wilh  a  probabilislie  ehoiee  eonslruel  choose  x  from  Dom  in  e 
where  Dom  is  a  finite  sel  of  real  numbers.  Ramsey  and  Pfeffer  [64]  presenl  a  sloehaslie  lambda  ealeulus  wilh 

'in  this  section,  p  (with  or  without  indices)  stands  for  prohahilities,  e  program  fragments,  and  v  values. 
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a  binary  choice  construct  choose  p  ei  e^-  All  these  constructs,  although  in  different  forms,  are  equivalent  to 
a  binary  choice  construct  and  have  the  same  expressive  power. 

An  easy  way  to  process  a  binary  choice  construct  (or  an  equivalent)  during  a  computation  is  to  generate 
a  sample  from  the  probability  distribution  it  denotes,  as  in  the  above  probabilistic  languages.  Another  way 
is  to  return  an  accurate  representation  of  the  probability  distribution  itself,  by  enumerating  all  elements  in 
its  support  along  with  their  probabilities.  Pless  and  huger  [61]  present  an  extended  lambda  calculus  which 
uses  a  probabilistic  construct  of  the  form  Yhi  •  Pi  where  YliPi  =  1-0.  A  program  denoting  a  probability 
distribution  computes  to  a  normal  form  Vi  :  pi,  which  is  an  accurate  representation  of  the  probability 
distribution.  Jones  [30]  presents  a  metalanguage  with  a  binary  choice  construct  ei  or^  62-  Its  operational 
semantics  uses  a  judgment  e  ^  Y^PiVi.  Mogensen  [43]  presents  a  language  for  specifying  die-rolls.  Its 
denotational  semantics  (called  probability  semantics)  is  formulated  in  a  similar  style,  directly  in  terms  of 
probability  measures. 

Jones  and  Mogensen  also  provide  an  equivalent  of  a  recursion  construct  which  enables  programmers  to 
specify  discrete  distributions  with  infinite  support  {e.g.,  geometric  distribution).  Such  a  probability  distribu¬ 
tion  is,  however,  difficult  to  represent  accurately  because  of  an  infinite  number  of  elements  in  its  support.  For 
this  reason,  Jones  assumes  YPi  —  1-0  in  the  judgment  e  ^  YPi'^i  Mogensen  uses  partial  probability 
distributions  in  which  the  sum  of  probabilities  may  be  less  than  1.0.  The  intuition  is  that  a  finite  recursion 
depth  is  used  so  that  some  elements  in  the  support  are  omitted  in  the  enumeration. 

There  are  a  few  probabilistic  languages  supporting  continuous  distributions.  Kozen  [34]  investigates  the 
semantics  of  probabilistic  while  programs.  A  random  assignment  x  :=  random  assigns  a  random  number 
to  variable  x.  Since  it  does  not  assume  a  specific  probability  distribution  for  the  random  number  generator, 
the  language  serves  only  as  a  framework  for  probabilistic  languages.  Thrun  [73,  74]  extends  C-i-i-  with 
probabilistic  data  types  which  are  created  from  a  template  prob  <  type>.  Although  the  language,  called  CES, 
supports  common  continuous  distributions,  its  semantics  is  not  formally  defined.  Our  work  is  originally 
motivated  by  the  desire  to  develop  a  probabilistic  language  that  is  as  expressive  as  CES  and  also  has  a 
formal  semantics. 


1.3  Sampling  functions  as  the  mathematical  basis 

The  expressive  power  of  a  probabilistic  language  is  determined  to  a  large  extent  by  its  mathematical  basis. 
That  is,  the  set  of  probability  distributions  expressible  in  a  probabilistic  language  is  determined  principally 
by  mathematical  objects  used  in  specifying  probability  distributions.  Since  we  intend  to  support  all  kinds 
of  probability  distributions  without  drawing  a  syntactic  or  semantic  distinction,  we  cannot  choose  what  is 
applicable  only  to  a  specific  kind  of  probability  distributions.  Examples  are  probability  mass  functions 
which  are  specific  to  discrete  distributions,  probability  density  functions  which  are  specific  to  continuous 
distributions,  and  cumulative  distribution  functions  which  assume  an  ordering  on  each  probability  domain. 

Probability  measures  [65]  are  a  possibility  because  they  are  synonymous  with  probability  distributions. 
A  probability  measure  p  over  a  domain  17  is  a  mapping  satisfying  the  following  conditions: 

•  /u(0)  =  0. 

•  =  1- 

•  Eor  a  countable  disjoint  union  UiDi  of  subsets  Di  of  V, 


p{UiDi)  =  YiKDi) 


where  UiDi  is  required  to  be  a  subset  of  V. 


5 


Conceptually  it  maps  the  set  of  subsets  of  V  (or,  the  set  of  events  on  V)  to  probabilities  in  [0.0, 1.0].  Prob¬ 
ability  measures  are,  however,  not  a  practical  choice  as  the  mathematical  basis  because  they  are  difficult  to 
represent  if  the  domain  in  infinite.  As  an  example,  consider  a  continuous  probability  distribution  P  of  the 
position  of  a  robot  in  a  two-dimensional  environment.  (Since  P  is  continuous,  the  domain  is  infinite  even 
if  the  environment  is  physically  finite.)  The  probability  measure  ^  corresponding  to  P  should  be  able  to 
calculate  a  probability  for  any  given  part  of  the  environment  (as  opposed  to  a  particular  spot  in  the  environ¬ 
ment)  —  whether  it  is  a  contiguous  region  or  a  collection  of  disjoint  regions,  or  whether  it  rectangular  or 
oval-shaped.  Thus  finding  a  suitable  representation  for  ^  involves  the  problem  of  representing  an  arbitrary 
part  of  the  environment,  and  is  thus  far  from  a  routine  task. 

The  main  idea  of  our  work  is  that  we  can  specify  a  probability  distribution  by  answering  “How  can  we 
generate  samples  from  it?’’,  or  equivalently,  by  providing  a  sampling  function  for  it.  A  sampling  function  is 
defined  as  a  mapping  from  the  unit  interval  (0.0, 1.0]  to  a  probability  domain  V.  Given  a  random  number 
drawn  from  U (0.0, 1.0],  it  returns  a  sample  in  V,  and  thus  specifies  a  unique  probability  distribution.  In  this 
way,  random  numbers  serve  as  the  source  of  probabilistic  choices. 

In  specifying  how  to  generate  samples,  we  wish  to  exploit  sampling  techniques  developed  in  simulation 
theory  [10],  most  of  which  consume  multiple  (independent)  random  numbers  to  produce  a  single  sample. 
To  this  end,  we  use  a  generalized  notion  of  sampling  function  which  maps  (0.0, 1.0]°°  to  D  x  (0.0, 1.0]°° 
where  (0.0, 1.0]°°  denotes  an  infinite  product  of  (0.0, 1.0].  Operationally  a  sampling  function  now  takes 
as  input  an  infinite  sequence  of  random  numbers  drawn  independently  from  U (0.0, 1.0],  consumes  zero  or 
more  random  numbers,  and  returns  a  sample  with  the  remaining  sequence.  This  generalization  of  the  notion 
of  sampling  function  is  acceptable  arithmetically  (but  not  measure-theoretically).  For  example,  we  can  use 
the  technique  of  expanding  a  single  real  number  in  (0.0, 1.0]  into  an  infinite  sequence  of  real  numbers  in 
(0.0, 1.0]  by  taking  even  and  odd  bits  of  a  binary  representation  of  a  given  real  number  to  produce  two  real 
numbers  and  repeating  the  procedure. 

As  the  mathematical  basis  of  FTP,  we  choose  sampling  functions,  which  overcome  the  problem  with 
probability  measures:  they  are  applicable  to  all  kinds  of  probability  distributions,  and  are  also  easy  to  rep¬ 
resent  because  a  global  random  number  generator  (which  generates  as  many  random  numbers  as  necessary 
from  U (0.0, 1.0])  supplants  the  use  of  infinite  sequences  of  random  numbers.  As  a  comparison  with  prob¬ 
ability  measures,  consider  the  probability  distribution  P  of  the  position  of  a  robot  discussed  above.  In 
devising  a  sampling  function  for  P,  we  only  have  to  construct  an  algorithm  that  probabilistically  generates 
possible  positions  of  the  robot;  hence  we  do  not  need  to  consider  the  problem  of  representing  an  arbitrary 
part  of  the  environment  (which  is  essential  in  the  case  of  probability  measures).  Intuitively  it  is  easier  to 
both  formalize  and  answer  “Where  is  the  robot  likely  to  be?”  than  “How  likely  is  the  robot  to  be  in  a  given 
region?”. 

The  use  of  sampling  functions  as  the  mathematical  basis  leads  to  three  desirable  properties  of  PTP.  First 
it  provides  a  unified  representation  scheme  for  probability  distributions:  we  no  longer  distinguish  between 
discrete  distributions,  continuous  distributions,  and  even  those  belonging  to  neither  group.  Such  a  unified 
representation  scheme  is  difficult  to  achieve  with  other  candidates  for  the  mathematical  basis.  Second  it  en¬ 
joys  rich  expressiveness:  we  can  specify  probability  distributions  over  infinite  discrete  domains,  continuous 
domains,  and  even  unusual  domains  such  as  infinite  data  structures  (e.g.,  trees)  and  cyclic  domains  (e.g., 
angular  values).  Third  it  enjoys  high  versatility:  there  can  be  more  than  one  way  to  specify  a  probability 
distribution,  and  the  more  we  know  about  it,  the  better  we  can  encode  it.  Section  3.2  demonstrates  these 
properties  with  various  examples  written  in  PTP. 
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Data  abstraction  for  probability  distributions 

In  PTP,  a  sampling  function  is  represented  by  a  probabilistic  computation  that  consumes  zero  or  more 
random  numbers  (rather  than  a  single  random  number)  drawn  from  [7(0.0, 1.0].  In  the  context  of  data 
abstraction,  it  means  that  a  probability  distribution  is  constructed  from  such  a  probabilistic  computation.  The 
expressive  power  of  PTP  allows  programmers  to  construct  (or  encode)  all  kinds  of  probability  distributions 
in  a  uniform  way.  Equally  important  is,  however,  the  question  of  how  to  observe  (or  reason  about)  a  given 
probability  distribution,  i.e.,  how  to  get  information  out  of  it,  through  various  queries.  Since  a  probabilistic 
computation  in  PTP  only  describes  a  procedure  for  generating  samples,  the  only  way  to  observe  a  probability 
distribution  is  by  generating  samples  from  it.  As  a  result,  PTP  is  limited  in  its  support  for  queries  on 
probability  distributions.  For  example,  it  does  not  permit  a  precise  implementation  of  such  queries  as  means, 
variances,  and  probabilities  of  specific  events. 

PTP  alleviates  this  limitation  by  exploiting  the  Monte  Carlo  method  [40],  which  approximately  answers 
a  query  on  a  probability  distribution  by  generating  a  large  number  of  samples  and  then  analyzing  them.  As 
an  example,  consider  a  (continuous)  probability  distribution  P  of  the  pose  {i.e.,  position  and  orientation) 
of  a  robot  in  a  two-dimensional  environment.  Here  are  a  few  queries  on  P  all  of  which  can  be  answered 
approximately: 

•  Draw  a  sample  of  robot  pose  at  random. 

•  What  is  the  expected  (average)  pose  of  the  robot? 

•  What  is  the  probability  that  the  robot  is  facing  within  five  degrees  of  due  east? 

•  What  is  the  probability  that  the  robot  is  in  Peter’s  office? 

•  Under  the  assumption  that  the  robot  is  in  Peter’s  office,  what  is  the  probability  that  the  robot  is  within 
two  feet  of  the  door?” 

These  queries  can  be  answered  approximately  by  repeatedly  performing  the  probabilistic  computation  as¬ 
sociated  with  P  and  then  analyzing  resultant  samples.  For  example,  the  last  query  can  be  answered  as 
follows: 

1.  Generate  samples  from  P. 

2.  Filter  out  those  samples  indicating  that  the  robot  is  not  in  Peter’s  office. 

3.  Counf  fhe  number  of  samples  indicating  fhaf  fhe  robof  is  wifhin  fwo  feel  of  Ihe  door,  and  divide  if  by 
Ihe  lofal  number  of  remaining  samples. 

Cerlain  queries  on  probabilily  dislribulions  are,  however,  difficull  lo  answer  even  approximately  by  fhe 
Monle  Carlo  melhod.  For  example,  fhe  following  queries  are  difficull  lo  answer  approximalely  by  a  simple 
analysis  of  samples: 

•  What  is  the  most  likely  position  of  the  robot? 

•  In  what  room  is  the  robot  most  likely  to  be  when  the  number  of  rooms  is  unknown? 

Due  lo  fhe  nalure  of  fhe  Monte  Carlo  melhod,  Ihe  cosl  of  answering  a  query  is  proportional  lo  Ihe 
number  of  samples  used  in  Ihe  analysis.  The  cosl  of  generating  a  single  sample  is  determined  by  Ihe  specific 
procedure  chosen  by  programmers,  ralher  lhan  by  Ihe  probabilily  dislribulion  ilself  from  which  lo  draw 
samples.  For  example,  a  geomelric  dislribulion  can  be  encoded  wilh  a  recursive  procedure  which  simulates 


7 


coin  tosses  until  a  certain  outcome  is  observed,  or  by  a  simple  transformation  (called  the  inverse  transform 
method)  which  requires  only  a  single  random  number.  These  two  methods  of  encoding  the  same  probability 
distribution  differ  in  the  cost  of  generating  a  single  sample  and  hence  in  the  cost  of  answering  the  same  query 
by  the  Monte  Carlo  method.  For  a  similar  reason,  the  accuracy  of  the  result  of  the  Monte  Carlo  method, 
which  improves  with  the  number  of  samples,  is  also  affected  by  the  procedure  chosen  by  programmers. 

Measure-theoretic  view  of  sampling  functions 

The  accepted  mathematical  basis  of  probability  theory  is  measure  theory  [65],  which  associates  every  prob¬ 
ability  distribution  with  a  unique  probability  measure.  We  give  a  summary  of  measure  theory  before  dis¬ 
cussing  the  connection  between  sampling  functions  and  measure  theory.  In  the  discussion  below,  sampling 
functions  refer  to  those  taking  (0.0, 1.0]  as  input,  rather  than  generalized  ones  taking  (0.0, 1.0]°°  as  input. 

•  Measurable  sets  of  a  space  V  are  subsets  of  V. 

•  A  measurable  space  M  (17)  is  a  collection  of  measurable  sets  of  V  such  that: 

-Ve  M{V). 

-  If  5  G  M(I7),  then  V  —  S  G  M(I7).  That  is,  M(I7)  is  closed  under  complement. 

-  For  a  countable  collection  of  measurable  sets  Si  G  M(I7),  it  holds  11*5*  G  M(I7).  That  is,  M(I7) 
is  closed  under  countable  union. 

•  A  measurable  function  f  from  17  to  is  a  mapping  from  M(I7)  to  M(i5)  such  that  if  S'  G  M(S),  then 
f-\S)  G  M(77). 

•  A  measure  p  over  M(I7)  is  a  mapping  from  M(I7)  to  [0.0,  oo]  such  that: 

-  ;u(0)  =  0. 

-  For  a  countable  disjoint  union  UiSi  of  measurable  sets  5*  G  M(I7),  it  holds  p,(UiSi)  =  Sj//(5j). 

•  A  probability  measure  p  over  M(I7)  satisfies  p{T>)  =  1. 

•  A  Lebesgue  measure  v  over  the  unit  interval  (0.0, 1.0]  is  a  probability  measure  such  that  z^(S')  is  equal 
to  the  total  length  of  intervals  in  S. 

Measure  theory  allows  certain  (but  not  all)  sampling  functions  to  specify  probability  distributions.  Con¬ 
sider  a  sampling  function  /  from  (0.0, 1.0]  to  17.  While  it  is  introduced  primarily  as  a  mathematical  function, 
/  may  be  interpreted  as  a  measurable  function  as  well,  in  which  case  it  defines  a  unique  probability  measure 
p  over  M  (17)  such  that 

p{s)  =  v{r\s)) 

where  is  a  Lebesgue  measure  over  the  unit  interval.  The  intuition  is  that  S,  as  an  event,  is  assigned  a 
probability  equal  to  the  size  of  it  inverse  image  under  /. 

This  dissertation  does  not  investigate  measure-theoretic  properties  of  sampling  functions  definable  in 
FTP.  If  a  probabilisfic  computation  expressed  in  PTP  consumes  at  most  one  random  number  (drawn  from 
U (0.0, 1.0]),  it  is  easy  to  identify  a  corresponding  sampling  function.  If  more  than  one  sample  is  consumed, 
however,  it  is  not  always  obvious  how  to  construct  such  a  sampling  function.  In  fact,  the  presence  of  fixed 
point  constructs  in  PTP  (for  recursive  computations  which  can  consume  an  arbitrary  number  of  random 
numbers)  makes  it  difficult  even  to  define  measurable  spaces  fo  which  sampling  functions  map  the  unit  in¬ 
terval,  since  fixed  point  constructs  use  domain-theoretic  structures,  rather  than  measure-theoretic  structures, 
in  order  to  solve  resultant  recursive  equations. 


Every  probabilistic  computation  expressed  in  FTP  is  easily  translated  into  a  generalized  sampling  func¬ 
tion  (which  takes  (0.0, 1.0]“  as  input).  It  is,  however,  unknown  if  generalized  sampling  functions  definable 
in  FTP  are  all  measurable.  Also  unknown  is  if  generalized  sampling  functions  are  measure-theoretically 
equivalent  to  ordinary  sampling  functions  {i.e.,  if  a  measurable  function  from  (0.0, 1.0]“  to  P  x  (0.0, 1.0]“ 
determines  a  unique  measurable  function  from  (0.0, 1.0]  to  V).  Nevertheless  generalized  sampling  func¬ 
tions  definable  in  FTP  are  shown  fo  be  closely  connected  wifh  sampling  techniques  from  simulation  fheory, 
which,  like  measure  fheory,  are  widely  agreed  fo  be  a  form  of  probabilistic  compufafion  and  FTP  is  designed 
fo  supporf.  A  furlher  discussion  is  found  in  Section  3.3. 


1.4  Linguistic  framework  for  FTP 

We  develop  FTP  as  a  functional  language  extending  fhe  A-calculus,  rafher  fhan  an  imperafive  language  or  a 
library  embedded  in  an  exisfing  conventional  language.  We  decide  fo  use  a  monadic  synfax  for  probabilis¬ 
tic  compulations.  The  decision  is  based  upon  Iwo  observalions.  Firsl  sampling  functions  are  operafionally 
equivalenf  fo  probabilistic  compulations  in  lhal  Ihey  describe  procedures  for  generaling  samples  from  in- 
finife  sequences  of  random  numbers.  Second  sampling  functions  form  a  state  monad  [44,  45,  64]  whose 
sel  of  slates  is  (0.0, 1.0]“.  These  Iwo  observations  imply  lhal  if  we  use  a  monadic  synlax  for  probabilistic 
compulations,  il  becomes  slraighlforward  lo  inlerprel  probabilistic  compulations  in  terms  of  sampling  func¬ 
tions.  The  monadic  synlax  Ireals  probability  dislribulions  as  firsl-class  values  and  offers  a  clean  separation 
belween  regular  values  and  probabilistic  compulations. 

Instead  of  designing  a  monadic  synlax  specialized  for  sampling  functions,  we  begin  by  developing  a 
linguistic  framework  Ao  which  accounls  for  compulalional  effecls  in  general.  Ao  does  nol  borrow  ils  synlax 
from  Moggi’s  monadic  melalanguage  Xmi  [44,  45].  Instead  il  extends  Ihe  monadic  language  of  Pfenning 
and  Davies  [60],  which  is  a  reformulation  of  X^i  from  a  modal  logic  perspective.  Aq  may  be  Ihoughl  of  as 
Iheir  monadic  language  combined  wilh  Ihe  possible  world  inlerprelalion  [35]  of  modal  logic. 

A  characteristic  fealure  of  Aq  is  thal  il  classifies  compulalional  effecls  into  Iwo  kinds:  world  effecls  and 
conlrol  effecls.  World  effecls  are  slaleful  compulalional  effecls  such  as  mulable  references  and  inpul/oulpul; 
conlrol  effecls  are  conlexlual  compulalional  effecls  such  as  exceptions  and  continuations.  Probabilistic 
choices  are  a  particular  case  of  world  effecl,  and  PTP  arises  as  an  inslance  of  Ao  wilh  a  language  conslrucl 
for  consuming  (or  drawing)  random  numbers  from  17(0.0, 1.0]. 


1.5  Applications  to  robotics 

Instead  of  implementing  PTP  as  a  complete  programming  language  of  ils  own,  we  embed  il  in  an  existing 
functional  language  by  building  a  Iranslalor.  Specifically  we  extend  Ihe  synlax  of  Objective  CAME  [2]  to 
incorporate  Ihe  synlax  of  PTP,  and  Ihen  Iranslale  language  conslrucls  of  PTP  back  into  Ihe  original  synlax. 
The  Iranslalor  is  sound  and  complete  in  Ihe  sense  lhal  bolh  type  and  reducibilily  of  any  program  in  PTP, 
whelher  well-lyped/reducible  or  ill-lyped/irreducible,  are  preserved  when  Iranslaled  in  Objective  CAME. 

An  imporlanl  pari  of  our  work  is  to  demonslrale  Ihe  use  of  PTP  by  applying  il  to  real  problems.  As 
Ihe  main  leslbed,  we  choose  robotics  [72].  Il  offers  a  variety  of  real  problems  lhal  necessilale  probabilistic 
compulations  over  continuous  dislribulions.  We  use  PTP  for  Ihree  applications  in  robotics:  robol  localiza¬ 
tion  [72],  people  Iracking  [50],  and  robotic  mapping  [75].  In  each  case,  Ihe  slate  of  a  robol  is  represented 
wilh  a  probability  dislribulion,  whose  update  equation  is  formulated  al  Ihe  level  of  probability  dislribulions 
and  Iranslaled  direclly  in  PTP.  All  experimenls  in  our  work  have  been  carried  oul  wilh  real  robols. 

A  comparison  belween  our  robol  localizer  and  anolher  wrillen  in  C  gives  evidence  lhal  Ihe  benelil  of 
implementing  probabilistic  compulations  in  PTP,  such  as  readability  and  conciseness  of  code,  can  oulweigh 
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its  disadvantage  in  speed  (see  Seetion  5.5  for  details).  Thus  FTP  serves  as  another  example  of  high-level 
language  whose  power  is  well  exploited  in  a  problem  domain  where  imperative  languages  have  been  tradi¬ 
tionally  dominant. 

1.6  Outline 

The  rest  of  this  dissertation  is  organized  as  follows.  Chapter  2  presents  the  linguistic  framework  Ao  to 
be  used  for  FTP.  Chapter  3  presents  the  syntax,  type  system,  and  operational  semantics  of  FTP.  Chapter  4 
describes  the  translator  of  FTP  in  Objective  C  AML.  Chapter  5  presents  three  applications  of  FTP  in  robotics. 
Chapter  6  concludes. 
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Chapter  2 

Linguistic  Framework 


This  chapter  presents  our  linguistic  framework  Ao  to  be  used  for  FTP.  Ao  is  an  extension  of  the  A-calculus 
(with  a  modality  O)  which  accounts  for  computational  effects  in  general.  In  developing  Aq,  we  are  interested 
in  modeling  such  computational  effects  as  input/output,  mutable  references,  and  continuations.  We  view 
probabilistic  choices  as  a  particular  case  of  computational  effect,  and  FTP  arises  as  an  instance  of  Ao  with  a 
language  construct  for  probabilistic  choices. 

Key  concepts  used  in  the  development  of  Ao  are  as  follows: 

•  Segregation  of  world  effects  and  control  effects.  Ao  classifies  computational  effects  into  two  kinds: 
stateful  world  effects  and  contextual  control  effects.  The  distinction  makes  it  easy  to  combine  com¬ 
putational  effects  at  the  language  design  level. 

•  Possible  world  interpretation  of  modal  logic.  Ao  uses  modal  logic  [12]  to  characterize  world  effects, 
and  relates  modal  logic  to  world  effects  by  the  possible  world  interpretation  [35].  As  a  result,  the 
notion  of  world  in  “world  effects”  coincides  with  the  notion  of  world  in  the  “possible  world  interpre¬ 
tation.”  In  formulating  the  logic  for  Aq,  we  use  the  judgmental  style  of  Pfenning  and  Davies  [60]. 

At  its  core,  Ao  applies  the  possible  world  interpretation  to  the  monadic  language  of  Pfenning  and 
Davies  [60],  which  uses  lax  logic  [19,  7]  in  the  judgmental  style  to  reformulate  Moggi’s  monadic  meta¬ 
language  Xml  [44,  45].  The  monadic  language  of  Pfenning  and  Davies  analyzes  computational  effects  only 
at  an  abstract  level  from  a  proof-theoretic  perspective,  and  does  not  readily  extend  to  a  programming  lan¬ 
guage  with  computational  effects.  Ao  is  an  attempt  to  extend  their  monadic  language  with  an  operational 
semantics  so  as  to  support  concrete  notions  of  computational  effect.  The  key  idea  is  to  combine  the  possi¬ 
ble  world  interpretation  and  the  judgmental  style  in  such  a  way  that  the  accessibility  relation  (which  is  an 
integral  part  of  the  possible  world  interpretation)  is  not  used  in  inference  rules  (unlike  the  system  of  modal 
logic  of  Simpson  [71],  for  example). 

Although  Ao  is  not  specific  to  probabilistic  computations  and  the  development  of  Aq  is  thus  optional 
for  the  purpose  of  designing  PTP,  we  investigate  Ao  to  better  explain  the  logical  foundation  of  PTP.  As  the 
definition  of  PTP  in  Chapter  3  is  self-contained,  this  chapter  can  be  skipped  without  loss  of  continuity  by 
those  readers  who  want  to  understand  only  PTP. 


2.1  Computational  effects  in  Ao 

This  section  gives  a  definition  of  computational  effects.  The  clarification  of  the  notion  of  computational 
effect  may  appear  to  be  of  little  significance  (because  we  already  know  whaf  is  called  compufafional  effecls 
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and  how  they  work),  but  it  has  a  profound  impact  on  the  overall  design  of  Ao-  This  section  also  gives  an 
overview  of  Aq  at  an  abstract  level  {i.e.,  without  its  syntax  and  semantics). 

Definition  of  computational  effects 

In  the  context  of  functional  languages,  computation  effects  are  usually  defined  as  what  destroys  the  “pu¬ 
rity”  of  functional  languages.  Informally  the  purity  of  a  functional  language  means  that  every  function  in 
it  denotes  a  mathematical  function,  i.e.,  a  black  box  converting  a  valid  argument  into  a  unique  outcome. 
For  example,  a  function  fnx=>x-i-  !yin  ML  does  not  denote  a  mathematical  function  because  its 
outcome  depends  on  the  content  of  reference  y  as  well  as  argument  x;  hence  we  conclude  that  mutable  refer¬ 
ences  are  computational  effects.  Other  examples  of  computational  effects  include  input/output,  exceptions, 
continuations,  non-determinism,  concurrency,  and  probabilistic  choices. 

The  notion  of  purity,  however,  is  subtle  and  there  is  no  universally  accepted  definition  of  purity.  Sabry  [67] 
shows  that  common  criteria  for  purity,  such  as  soundness  of  the  /3-equational  axiom,  confluence  (fhe  Church- 
Rosser  properly  or  independence  of  order  of  evaluation),  and  preservation  of  observalional  equivalences, 
are  incomplele  in  lhaf  eilher  Ihey  fail  fo  hold  in  some  pure  functional  languages  or  Ihey  conlinue  lo  hold 
in  some  impure  funclional  languages  (referenlial  Iransparency  is  nol  considered  because  if  does  nol  have  a 
universally  accepled  definilion).  He  proposes  a  definilion  of  purify  based  upon  independence  of  reduction 
slralegies,  buf  Ibis  definilion  has  a  drawback  lhaf  a  given  functional  language  musl  have  implemenlafions  of 
Ihree  reduclion  slralegies,  namely,  call-by-value,  call-by-need,  and  call-by-name. 

As  a  resull,  fhe  definition  of  compulafional  effecls  as  whaf  desfroys  fhe  purify  of  funclional  languages  is 
ambiguous,  and  some  concepfs  are  called  compulafional  effecls  wilhouf  any  juslificalion.  For  example,  non- 
ferminalion  is  called  a  compulafional  effecl  only  by  convenlion  (as  a  special  kind  of  compulafional  effecl 
which  is  nol  observable).  Al  fhe  same  time,  one  may  argue  lhaf  non-termination  is  not  a  computational  effect 
because  the  use  of  pointed  types  {i.e.,  types  augmented  with  a  bottom  element  _L  denoting  non-termination) 
preserves  the  property  of  mathematical  functions. 

A  definition  of  computational  effects  is  not  necessary  in  designing  a  functional  language,  such  as  ML 
and  Scheme,  that  allows  any  program  fragment  to  produce  computational  effects.  It  is,  however,  crucial 
to  the  design  of  a  functional  language,  such  as  Haskell  98^  [55]  (and  Aq),  that  subsumes  a  sublanguage 
for  computational  effects,  since  a  criterion  for  computational  effects  determines  features  supported  by  the 
sublanguage.  The  case  of  Haskell  illustrates  the  importance  of  a  proper  definition  of  computational  effects, 
and  also  inspires  our  definition  of  computational  effects. 

Computational  effects  in  Haskell 

Since  their  introduction  to  the  programming  language  community,  monads  [44,  45]  have  been  considered 
as  an  elegant  means  of  structuring  functional  programs  and  incorporating  computational  effects  into  func¬ 
tional  languages  [76,  77].  A  good  example  of  a  functional  language  that  makes  extensive  use  of  monads 
in  its  design  is  Haskell.  At  the  programming  level,  it  provides  a  type  class  Monad  to  facilitate  modular 
programming;  at  the  language  design  level,  it  provides  a  built-in  10  monad  for  producing  computational 
effects  without  compromising  its  properties  as  a  pure  functional  language. 

Haskell  does  not  assume  a  particular  definition  of  computational  effects.  Instead  it  implicitly  identifies 
compufafional  effecfs  wifh  monads  and  confines  all  kinds  of  compufafional  effecls  lo  fhe  10  monad  [56,  58] 
(or  a  similar  one  such  as  fhe  ST  monad).  Thus  Haskell  conceplually  consisls  of  Iwo  sublanguages:  a 
funclional  sublanguage  which  never  produces  compufafional  effecls,  and  a  monadic  sublanguage  which  is 
formed  by  fhe  10  monad. 

'Abbreviated  as  Haskell  henceforth. 
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The  identification  between  computational  effects  and  monads  may  appear  to  be  innocuous,  perhaps 
because  of  the  success  of  monads  as  a  means  of  modeling  different  computational  effects  in  a  uniform 
manner.  When  all  kinds  of  computational  effects  are  present  together,  however,  the  identification  becomes 
problematic  because  monads  do  not  combine  well  with  each  other  [32,  31,  39].  Haskell  uses  the  10  monad 
for  all  kinds  of  computational  effects  without  explicitly  addressing  this  difficulty. 

The  identification  also  enforces  unconventional  treatments  of  some  computational  effects.  For  example, 
it  disallows  exceptions  for  the  functional  sublanguage,  which  would  be  useful  for  handling  division  by 
zero  or  pattern-match  failures.  It  also  disallows  continuations  for  the  functional  sublanguage,  which  would 
be  useful  for  implementing  advanced  control  constructs  such  as  non-local  exits  and  co-routines.  Hence 
the  identification  significantly  limits  the  practical  utility  of  exceptions  and  continuations.  For  this  reason,  an 
extension  of  Haskell  proposed  by  Peyton  Jones  et  al.  [57]  allows  exceptions  not  for  the  monadic  sublanguage 
but  for  the  functional  sublanguage,  thereby  deviating  from  the  identification  between  computational  effects 
and  monads. 

Our  view  is  that  computational  effects  are  not  identified  with  monads  and  that  the  identification  between 
computational  effects  and  monads  in  Haskell  is  a  consequence  of  lack  of  a  proper  definition  of  computational 
effects.  The  capability  of  monads  to  model  all  kinds  of  computational  effects  may  be  the  rationale  for  the 
identification,  but  it  does  not  really  warrant  the  identification;  rather  it  only  implies  that  monads  are  a 
particular  tool  for  studying  the  denotational  semantics  of  computational  effects. 

As  an  example,  consider  the  set  monad  for  modeling  non-determinism  [76].^  The  set  monad  is  suitable 
for  specifying  the  denotational  semantics  of  a  non-deterministic  language  (which  has  a  non-deterministic 
choice  construct),  since  a  program  can  be  translated  into  a  set  enumerating  all  possible  outcomes.  The  set 
monad  does  not,  however,  lend  itself  to  the  operational  design  of  a  non-deterministic  language,  in  which  a 
program  returns  a  single  outcome,  instead  of  the  set  of  all  possible  outcomes,  after  producing  computational 
effects.  Therefore  the  set  monad  is  useful  for  developing  the  denotational  semantics  (and  also  possibly 
the  syntax)  of  a  non-deterministic  language,  but  not  for  implementing  it  operationally.  In  fact,  if  the  set 
monad  was  enough  for  implementing  a  non-deterministic  language  operationally,  we  could  argue  that  the 
built-in  10  monad  is  unnecessary  in  Haskell  because  we  can  instantiate  the  type  class  Monad  to  mimic 
all  computational  effects  supported  by  the  10  monad.  Thus  the  main  lesson  learned  from  Haskell  is  that 
modeling  a  computational  effect  is  a  separate  issue  from  implementing  it  operationally. 

Another  lesson  learned  from  Haskell  is  that  as  its  implementation  is  based  upon  a  state  monad,  the 
10  monad  is  suitable  for  stateful  computational  effects  such  as  mutable  references  and  input/output,  but 
not  compatible  with  contextual  computational  effects  such  as  exceptions  and  continuations.  That  is,  while 
stateful  computational  effects  may  well  be  identified  with  the  1 0  monad,  contextual  computational  effects 
do  not  need  to  be  restricted  to  the  monadic  sublanguage.  Our  definition  of  computational  effects  captures 
the  distinction  between  these  two  kinds  of  computational  effects,  calling  the  former  world  effects  and  the 
latter  control  effects. 

World  effects  and  control  effects 

We  directly  define  computational  effects  without  relying  on  another  notion  such  as  purity  of  functional 
languages.  A  central  assumption  is  that  the  run-time  system  consists  of  a  program  and  a  world.  A  program 
is  subject  to  a  set  of  reduction  rules.  For  example,  a  program  in  the  A-calculus  runs  by  applying  the  (3- 
reduction  rule.  A  world  is  an  object  whose  behavior  is  specified  by  the  programming  environment  rather 
than  by  reduction  rules.  For  example,  a  keyboard  buffer  can  be  part  of  a  world  such  that  a  keystroke  or  a 
read  operation  changes  its  contents.  In  contrast,  a  heap  is  not  part  of  a  world  because  it  is  just  a  convenience 
for  implementing  reduction  rules.  That  is,  we  can  implement  all  reduction  rules  without  using  heaps  at  all. 

^If  the  reader  holds  the  view  that  computational  effects  and  monads  are  identified,  this  example  may  well  be  hard  to  follow! 
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When  an  external  agent  or  a  program  interaets  with  a  world  and  eauses  a  transition  to  another  world,  we 
say  that  a  world  effeet  oeeurs.  For  example,  if  a  keyboard  buffer  is  part  of  a  world,  a  keystroke  by  a  user  or 
a  read  operation  by  a  program  ehanges  its  eontents  and  thus  eauses  a  world  effeet.  As  another  example,  if  a 
store  for  mutable  referenees  is  part  of  a  world,  an  operation  to  alloeate,  dereferenee,  or  dealloeate  referenees 
interaets  with  the  world  and  thus  eauses  a  world  effeet. 

When  a  program  undergoes  a  ehange  that  no  sequenee  of  reduetion  rules  ean  induee,  we  say  that  a 
eontrol  effeet  oeeurs.  For  example,  if  the  /3-reduetion  rule  is  the  only  reduetion  rule,  raising  an  exeeption 
eauses  a  eontrol  effeet  beeause  in  general,  it  induees  a  ehange  that  is  independent  of  the  /3-reduetion  rule. 
For  a  similar  reason,  eapturing  and  throwing  eontinuations  eause  eontrol  effeets.  Note  that  the  eoneept  of 
eontrol  effeet  is  relative  to  the  set  of  “basie”  reduetion  rules  assumed  by  the  run-time  system.  One  eould 
imagine  a  run-time  system  with  built-in  reduetion  rules  for  exeeptions,  in  whieh  ease  raising  an  exeeption 
would  not  be  regarded  as  a  eontrol  effeet. 

Thus  world  effeets  and  eontrol  effeets  have  fundamentally  different  eharaeteristies  and  are  realized  in 
different  ways.  World  effeets  are  realized  by  speeifying  a  world  strueture  —  empty  world  strueture  if  there 
are  no  world  effeets,  keyboard  buffer  and  display  window  for  input/output,  store  for  mutable  referenees,  and 
so  on.  Control  effeets  are  realized  by  introdueing  program  transformation  rules  (that  eannot  be  defined  in 
terms  of  existing  reduetion  rules).  Sinee  world  struetures  and  program  transformation  rules  are  eoneerned 
with  different  parts  of  the  run-time  system,  world  effeets  and  eontrol  effeets  are  treated  in  orthogonal  ways. 

The  distinetion  between  world  effeets  and  eontrol  effeets  makes  it  easy  to  eombine  eomputational  ef¬ 
feets  at  the  language  design  level.  Different  world  effeets  are  eombined  by  merging  eorresponding  world 
struetures.  For  example,  a  world  strueture  with  a  keyboard  buffer  and  display  window  and  a  store  realizes 
both  input/output  and  mutable  referenees.  There  is  no  need  to  explieitly  eombine  eontrol  effeets  with  other 
eomputational  effeets,  sinee  eontrol  effeets  beeome  pervasive  onee  eorresponding  program  transformation 
rules  are  introdueed. 

World  effeets  are  further  divided  into  internal  world  effeets  and  and  external  world  effeets.  An  internal 
world  effeet  is  always  eaused  by  a  program  and  is  ephemeral  in  the  sense  that  the  ehange  it  makes  to  a 
world  ean  be  undone  by  the  run-time  system.  An  example  is  to  alloeate  new  referenees,  whieh  ean  be  later 
reelaimed  by  the  run-time  system.  An  external  world  effeet  is  eaused  either  by  an  external  agent,  affeeting 
a  program,  or  by  a  program,  affeeting  an  external  agent.  It  is  perpetual  in  the  sense  that  the  ehange  it  makes 
to  a  world  eannot  be  undone  by  the  run-time  system.  An  example  is  to  use  keyboard  input  or  to  send  output 
to  a  printer  —  onee  you  type  a  password  to  a  malieious  program  or  print  it  on  a  publie  printer,  there  is  no 
going  baek  from  the  eatastrophie  eonsequenee! 

While  internal  world  effeets  oeeur  within  the  run-time  system,  external  world  effeets  involve  interaetions 
with  external  agents.  In  this  regard,  all  external  world  effeets  are  examples  of  eoneurreney  in  the  presenee 
of  external  agents.  Ao  is  not  intended  to  model  external  agents,  and  we  restriet  ourselves  to  internal  world 
effeets  in  developing  Aq- 

From  Haskell  to  Ao 

As  mentioned  earlier,  Haskell  eoneeptually  eonsists  of  two  sublanguages:  1)  a  funetional  sublanguage  whieh 
is  essentially  the  A-ealeulus  and  never  produees  eomputational  effeets;  2)  a  monadie  sublanguage  whieh  is 
formed  by  the  10  monad  and  produees  both  world  effeets  and  eontrol  effeets.  Peyton  Jones  [58]  elarifies 
the  distinetion  between  the  two  sublanguages  with  a  two-level  semanties:  an  inner  denotational  semanties 
for  the  funetional  sublanguage  and  an  outer  transition  (operational)  semanties  for  the  monadie  sublanguage. 

As  eontrol  effeets  do  not  need  to  be  restrieted  to  the  monadie  sublanguage,  we  eonsider  a  variant  of 
Haskell  that  allows  both  its  funetional  and  monadie  sublanguages  to  produee  eontrol  effeets.  In  eomparison 
with  Haskell,  this  variant  has  a  disadvantage  that  a  funetion  may  not  denote  a  mathematieal  funetion,  but  it 
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overcomes  the  limitation  of  Haskell  in  dealing  with  control  effects. 

Ao  can  be  thought  of  as  a  reformulation  of  the  variant  of  Haskell  from  a  logical  perspective.  It  has 
two  syntactic  categories:  terms  and  expressions.  Terms  form  a  sublanguage  which  subsumes  the  A-calculus 
and  is  allowed  to  produce  only  control  effects;  expressions  forms  another  sublanguage  which  is  allowed  to 
produce  both  world  effects  and  control  effects.  The  logic  behind  the  definition  of  expressions  is  the  same 
as  the  logic  underlying  monads,  namely  lax  logic  [7].  Thus,  like  the  monadic  sublanguage  of  Haskell, 
expressions  in  Ao  enforce  the  monadic  syntax  (with  the  modality  O). 


2.2  Logical  preliminaries 

Ao  has  a  firm  logical  foundation,  providing  a  logical  analysis  of  computational  effects.  This  section  explains 
those  concepts  from  logic  that  play  key  roles  in  the  development  of  Ao- 

2.2.1  Curry-Howard  isomorphism  and  judgmental  formulation 

The  Curry-Howard  isomorphism  [27]  is  a  principle  connecting  logic  and  programming  languages.  It  states 
that  propositions  in  logic  correspond  to  types  in  programming  languages  (propositions-as-types  correspon¬ 
dence)  and  that  proofs  in  logic  correspond  to  programs  in  programming  languages  (proofs-as-programs 
correspondence).  Given  a  formulation  of  logic,  it  systematically  derives  the  type  system  and  reduction 
rules  of  a  corresponding  programming  language.  The  development  of  Ao  follows  the  same  pattern:  we  first 
formulate  the  logic  for  Aq,  and  then  apply  the  Curry-Howard  isomorphism  to  obtain  the  type  system  and 
reduction  rules. 

The  logic  for  Aq  is  formulated  in  the  judgmental  style  of  Pfenning  and  Davies  [60].  A  judgmental 
formulation  of  logic  adopts  Martin-Ldf ’s  methodology  of  distinguishing  between  propositions  and  judg¬ 
ments  [42].  It  differs  from  a  traditional  formulation  which  relies  solely  on  propositions.  Below  we  review 
results  from  Pfenning  and  Davies  [60]. 

Propositions  and  Judgments 

In  a  judgmental  formulation  of  logic,  a  proposition  is  an  object  of  verification  whose  truth  is  checked  by 
inference  rules,  whereas  a  judgment  is  an  object  of  knowledge  which  becomes  evident  by  a  proof.  Examples 
of  propositions  are  ‘1  -i-  1  is  equal  to  0’  and  ‘1  -i-  1  is  equal  to  2’,  both  under  inference  rules  based  upon 
arithmetic.  Examples  of  judgments  are  “‘1  -i-  1  is  equal  to  0’  is  true”,  for  which  there  is  no  proof,  and  “‘1  -i- 
1  is  equal  to  2’  is  true”,  for  which  there  is  a  proof. 

To  clarify  the  difference  between  propositions  and  judgments,  consider  a  statement  ‘the  moon  is  made 
of  cheese.’  The  statement  is  not  yet  an  object  of  verification,  or  a  proposition,  since  there  is  no  way  to  check 
its  truth.  It  becomes  a  proposition  when  an  inference  rule  is  given,  for  example,  (written  in  a  pedantic  way) 
‘“the  moon  is  made  of  cheese’  is  true  if  ‘the  moon  is  greenish  white  and  has  holes  in  it’  is  true.”  Now  we 
can  attempt  to  verify  the  proposition,  for  example,  by  taking  a  picture  of  the  moon.  That  is,  we  still  do 
not  know  whether  the  proposition  is  true  or  not,  but  by  virtue  of  the  inference  rule,  we  know  at  least  what 
counts  as  a  verification  of  it.  If  the  picture  indeed  shows  that  the  moon  is  greenish  white  and  has  holes  in 
it,  the  inference  rule  makes  evident  the  judgment  “‘the  moon  is  made  of  cheese’  is  true.”  Now  we  know 
“‘the  moon  is  made  of  cheese’  is  true”  by  the  proof  consisting  of  the  picture  and  the  inference  rule.  Thus 
a  proposition  is  an  object  of  verification  which  may  or  may  not  be  true,  whereas  a  judgment  is  an  object  of 
knowledge  which  we  either  know  or  do  not  know. 

As  a  more  concrete  example,  consider  the  conjunction  connective  A.  In  order  for  A  A  H  to  be  a  propo¬ 
sition,  we  need  a  way  to  check  its  truth.  Since  A  A  77  is  intended  to  be  true  whenever  both  A  and  B  are  true. 
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we  use  the  following  inference  rule  to  explain  A  /\  B  &  proposition;  we  assume  that  both  A  and  B  are 
propositions,  and  abbreviate  a  truth  judgment  “A  is  true”  as  A  true: 

A  true  B  true  . 

AaB  true 

The  rule  A I  says  that  if  A  is  true  and  B  is  true,  then  A  A  B  is  true.  It  follows  the  usual  interpretation  of  an 

inference  rule:  if  the  premises  hold,  then  the  conclusion  holds.  We  use  the  rule  A I  to  construct  a  proof  V  of 

'D 

AaB  true  from  a  proof  of  A  true  and  a  proof  Vb  of  B  true',  we  write  ^  to  mean  that  is  a 
proof  of  A  true'. 

T^a  T^b 

V  =  A  true  B  true  . 

AaB  true 

Thus  yl  A  i?  is  a  proposition  because  we  can  check  its  truth  according  to  the  rule  A  I,  whereas  A  A  B  true  is 
a  judgment  because  we  either  know  it  or  do  not  know  it,  depending  on  the  existence  of  a  proof. 

The  rule  Al  above  is  called  an  introduction  rule  for  the  conjunction  connective  A,  since  its  conclusion 
deduces  a  truth  judgment  with  A,  or  introduces  A.  A  dual  concept  is  an  elimination  rule,  whose  premises 
exploit  a  truth  judgment  with  A  to  prove  another  judgment  in  the  conclusion,  or  eliminates  A.  In  the  case  of 
A,  there  are  two  elimination  rules,  A  El  and  AEr: 

AaB  true  .  ^  A  A  B  true  .  ^ 

- 7~, -  - tT"] - 

A  true  B  true 

These  elimination  rules  make  sense  because  A  A  B  true  implies  both  A  true  and  B  true.  We  will  later 
discuss  their  properties  in  a  more  formal  way. 

It  is  important  that  in  a  judgmental  formulation  of  logic,  the  notion  of  judgment  takes  priority  over  the 
notion  of  proposition.  Specifically  the  notion  of  judgment  does  not  depend  on  propositions,  and  a  new 
kind  of  judgment  is  defined  only  in  ferms  of  existing  judgmenfs  (buf  wifhouf  using  existing  connecfives  or 
modalities).  On  fhe  ofher  hand,  proposifions  are  always  explained  wifh  exisfing  judgmenfs  (including  al  leasl 
Irulh  judgmenfs),  and  a  new  conneclive  or  modalily  is  defined  so  as  lo  compacfly  represenl  fhe  knowledge 
expressed  by  exisfing  judgmenfs.  For  example,  we  could  define  a  falsehood  judgmenl  A  false  as  “A  true 
does  nol  hold,”  and  Ihen  use  a  new  modalily  -■  wifh  fhe  following  inlroduclion  rule: 

A  false  I 
-lA  true 

We  say  lhal  fhe  rule  -il  internalizes  A  false  as  a  proposition  -lA. 

If  fhe  definition  of  a  connective  or  modalily  involves  anolher  connective  or  modalily,  we  say  fhal  orlhog- 
onalily  is  deslroyed  in  fhe  sense  lhaf  fhe  Iwo  connecfives  or  modalities  cannol  be  developed  independenlly, 
or  orlhogonally.  In  Ibis  disserlalion,  we  use  no  connective  or  modalily  deslroying  orlhogonalily. 

Categorical  judgments  and  hypothetical  Judgments 

A  judgment  such  as  “A  is  true”  is  called  a  categorical  judgment  because  it  involves  no  hypotheses  and  is 
thus  unconditional.  Another  judgment  that  we  need  is  a  hypothetical  judgment,  which  involves  hypotheses. 
A  general  form  of  hypothetical  judgment  reads  “if  judgments  Ji,  •  •  •  ,Jn  hold,  then  a  judgment  J  holds,” 
written  as  Ji,  •  •  •  ,Jn\~J.  We  refer  to  Jj,  1  <  i  <  n,  as  an  antecedent  and  J  as  the  succedent. 

A  hypothetical  judgment  Ji,  •  •  •  ,Jn^J  becomes  evident  by  a  proof  of  J  in  which  Ji,  •  •  •  ,  Jn  are 
assumed  to  be  evident  without  proofs.  Such  a  proof  V  is  called  a  hypothetical  proof  and  is  written  as 
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follows: 


V 


Jl  ■  ■  ■  J-n 

}  inference  rules 
J 


Inference  rules  here  use  judgment  Jj  without  requiring  a  proof,  that  is,  as  a  hypothesis.  We  say  that  a 
hypothesis  J*  is  discharged  when  inference  rules  use  it  to  deduce  Ji.  Note  that  a  hypothetical  proof  of  •  h  J 
(with  no  antecedent)  is  essentially  a  proof  of  judgment  J  and  vice  versa,  since  both  proofs  show  that  J  holds 
categorically.^ 

The  notion  of  hypothetical  proof  is  illustrated  by  the  implication  connective  D.  In  order  for  ^  D  77  to 
be  a  proposition,  we  need  a  way  to  check  its  truth.  Since  A  D  77  is  intended  to  be  true  whenever  A  true 
implies  77  true,  the  introduction  rule  uses  a  hypothetical  proof  in  its  premise: 


[A  true] 


77  true 
Ad  B  true 

The  elimination  rule  for  D  exploits  Ad  B  true  in  its  premises  to  prove  77  true  in  its  conclusion: 

Ad  B  true  A  true  _ 

77  true  ^ 

The  rule  dE  makes  sense  because  A  D  B  true  licenses  us  to  deduce  77  true  if  A  true  holds,  which  is  the 
case  by  the  second  premise. 

Our  definition  of  hypothetical  judgments  makes  two  implicit  assumptions:  1)  the  order  of  antecedents 
is  immaterial;  2)  an  antecedent  may  be  used  zero  or  more  times  in  a  hypothetical  proof.  These  assumptions 
are  formally  stated  in  the  three  structural  rules  of  hypothetical  judgments: 

(Exchange)  If  Ji,  •  •  •  ,Ji,  Ji+i,  ■■■  ,Jn'r  J, 

then  J\,  ‘  ‘ \ ,  Ji, ' ' '  ,  Jn  ^  J • 

(Weakening)  If  Ji ,  •  •  • 

then  Jl,  •  •  •  ,  Jn,  Jn+i  h  J  for  any  judgment  Jn+i. 

(Contraction)  If  Ji,  •  •  •  ,  Jj,  J*,  •  •  • 

then  Jl,***  ,  Ji,  •  •  •  ,J^I“  J. 

A  hypothetical  proof  can  be  combined  with  another  hypothetical  proof.  For  example,  a  hypothetical 
proof  D  of  Jl ,  *  *  *  ,  Jn  h  J  is  combined  with  a  hypothetical  proof  iSi  of  J2 ,  *  *  *  ,  J^  E  Ji  to  produce  another 
hypothetical  proof,  written  as  [Si/ Ji\D,  of  J2,  *  *  *  ,  Jn  E  J: 

J2  • 

[Si/Ji]V  =  Jl  J2  • 

J 

^This  equivalence  does  not  mean  that  a  hypothetical  judgment  *  h  J  is  equivalent  to  judgment  J.  While  the  former  states  that 
J  holds  categorically,  the  latter  is  unaware  of  whether  there  are  hypotheses  or  not,  and  could  he  even  a  hypothesis  in  a  hypothetical 
proof.  For  example,  from  the  assumption  that  J  implies  J',  we  can  show  that  ■  h  J  implies  *  h  J' .  The  converse  is  not  the  case, 
however. 


Jn 

Jn 

} 
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Note  that  hypotheses  J2,  • ' '  iJn  may  be  used  twiee:  when  proving  Ji  in  £i  and  when  proving  J  in  V.  This 
property  of  hypothetieal  judgments  that  a  hypothetieal  proof  ean  be  substituted  into  another  hypothetieal 
proof  is  ealled  the  substitution  principle: 

•  (Substitution  prineiple)  If  T  h  J  and  T,  J  h  J',  then  T  h  J'. 

A  eonvenient  way  to  prove  hypothetieal  judgments  is  to  use  inferenee  rules  for  hypothetieal  judgments 
without  relying  on  hypothetieal  proofs.  For  example,  we  ean  explain  the  implieation  eonneetive  D  with  the 
following  inferenee  rules  for  hypothetieal  judgments;  we  abbreviate  a  eolleetion  of  anteeedents  as  F: 

r,  A  true  h  B  true  T  h  A  D  B  true  T  h  A  true 

T  \-  A  D  B  true  ^  Phi?  true  ^ 


Here  the  introduetion  rule  Dl  uses  hypothetieal  judgments  to  express  that  a  proposition  A  D  B  is  true 
whenever  A  true  implies  B  true',  the  elimination  rule  dE  uses  hypothetieal  judgments  to  express  that 
Ad  B  true  lieenses  us  to  deduee  B  true  if  A  true  holds.  A  proof  of  F  h  J  with  these  inferenee  rules 
guarantees  the  existenee  of  a  eorresponding  hypothetieal  proof  of  F  h  J. 

A  speeial  form  of  hypothetieal  judgment  Ji,  •  •  •  ,Jn^Ji  (where  the  sueeedent  matehes  an  an¬ 

tecedent)  is  evident  by  a  vacuous  proof.  The  following  inference  rule,  called  the  hypothesis  rule,  expresses 
this  property  of  hypothetical  judgments;  it  simply  says  that  any  hypothesis  can  be  used: 


F,  Jh  J 


Hyp 


From  now  on,  we  assume  that  antecedents  and  succedents  in  hypothetical  judgments  are  all  basic  judg¬ 
ments.  For  example,  we  do  not  consider  such  hypothetical  judgments  as  (Fi  h  Ji)  h  J2  and  Fi  h  (F2  F  J). 


The  Curry-Howard  isomorphism 

The  Curry-Howard  isomorphism  connects  logic  and  programming  languages  by  representing  a  proof  of  a 
judgment  with  a  program  of  a  corresponding  type.  In  other  words,  a  well-typed  program  is  a  compact  rep¬ 
resentation  of  a  valid  proof  under  the  Curry-Howard  isomorphism.  Typically  we  apply  the  Curry-Howard 
isomorphism  by  translating  inference  rules  of  logic  into  typing  rules  of  a  programming  language.  By  con¬ 
vention,  a  typing  rule  is  given  the  same  name  as  the  inference  rule  from  which  it  is  derived. 

As  an  example,  we  consider  the  logic  of  truth  with  the  conjunction  connective  A  and  the  implication 
connective  D.  Under  the  Curry-Howard  isomorphism,  the  logic  corresponds  to  the  type  system  of  the  A- 
calculus  with  product  types.  A  proof  V  oi  A  true  is  represented  with  a  proof  term  M  of  type  A.  Note  that 
A  is  interpreted  both  as  a  proposition  and  as  a  type.  We  use  a  judgment  M  :  Ato  mean  that  proof  term  M 
represents  a  proof  of  A  true,  or  that  proof  term  M  has  type  A.  Thus  we  have  the  following  correspondence: 

A  true  ^  M  :  A 

Now  consider  the  use  of  the  inference  rule  A I  in  constructing  a  proof  V  of  A  A  B  true  from  a  proof 
of  A  true  and  a  proof  Vb  of  B  true.  When  proof  terms  Ma  and  Mb  represent  Va  and  Vb,  respectively, 
we  use  a  product  term  {Ma,  Mb)  of  product  type  A  A  B  to  represent  V.  Thus  the  inference  rule  A I  is 
translated  into  the  following  typing  rule: 


Ma  :  A  Mb  '■  B 
{Ma,Mb)  :  AaB 
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r,  A  true  h  A  true 

r  h  ^  true  r  h  i?  true 
T\-  A  A  B  true 

T  \-  A  A  B  true 


Hyp 


Al 


r  h  A  true 
r  \-  A  A  B  true 
r  h  i?  true 
r,  A  true  h  B  true 


AEl 

AEr 


T  A  D  B  true 

T  \-  Ad  B  true  F  h  A  true 
r  h  B  true 


Dl 


DE 


T ,x  ■.  A\-  X  :  A 
TAM  :A  TEN 


Hyp 


B 


TE  {M,N)  :  AaB 
TE  M  :AaB 


TEhtM  :  A 
TE  M  :AAB 


AE, 


r  h  snd  M  :  B 
T,x  :  AE  M  :  B 


AEf 


Al 


Dl 


TE  Xx:A.M  :  Ad  B 

TEM:AdB  ThAf  :  a 
TE  M  N  :  B 


DE 


Figure  2.1:  Translation  of  inference  rules  for  hypothetical  judgments  into  typing  rules. 


We  use  projection  terms  fst  M  and  snd  M  in  translating  the  rules  A  El  and  AEr: 

M  :  AAB  M  :  AAB 

htM-.A  snd  M  :  5 


When  a  hypothetieal  proof  uses  A  true  as  a  hypothesis,  it  assumes  the  existenee  of  a  proof.  Sinee 
sueh  a  proof  is  aetually  unknown,  it  eannot  be  represented  with  a  eonerete  proof  term.  Henee  it  is  repre¬ 
sented  with  a  variable  x,  a  speeial  proof  term  whieh  ean  be  replaeed  by  another  proof  term.  Then  a  proof 
T>  of  yli  true,  •  •  •  ,  An  true  h  A  true  is  represented  with  a  proof  term  M  satisfying  a  hypothetieal  judg¬ 
ment  xi  :  Al,  -  ■  ■  ,Xn  '■  An  E  M  :  A,  whieh  means  that  proof  term  M  has  type  A  under  the  assumption  that 
variable  Xj,  1  <  i  <  n,  has  type  Ap. 


V 

Al  true,  ■  ■  ■  ,  An  true  h  A  true 


AA  xi  :  Al,  -  ■  ■  ,Xn  '■  An  E  M  :  A 


We  refer  to  a  eolleetion  of  judgments  xi  :  •  •  •  ,Xn  '-  An  as  a  typing  context.  As  with  eolleetions  of 

anteeedents,  we  abbreviate  typing  eontexts  as  F;  all  variables  in  a  typing  eontext  are  assumed  to  be  distinet. 

With  the  eorrespondenee  of  hypothetieal  judgments  above,  inferenee  rules  for  hypothetieal  judgments 
in  logie  are  translated  into  typing  rules  for  hypothetieal  judgments  T  E  M  ■.  A.  For  example,  the  inferenee 
rules  dI  and  dE  are  translated  into  the  following  typing  rules,  whieh  use  a  lambda  abstraction  Xx:A.M 
and  a  lambda  application  M  N  as  proof  terms: 

T,x:AEM:B  T  E  M  :  A  D  B  T  E  N  :  A 

TE  Xx:A.M  :  Ad  B  ^  TEMN:B 


Figure  2.1  shows  inferenee  rules  for  hypothetieal  judgments  in  logie  (shown  in  the  left  eolumn)  and 
their  translation  into  typing  rules  (shown  in  the  right  eolumn).  The  left  eolumn  shows  inferenee  rules  for 
hypothetieal  judgments,  and  right  eolumn  shows  eorresponding  typing  rules.  The  hypothesis  rule  Hyp  is 
translated  into  a  typing  rule,  also  ealled  the  hypothesis  rule,  that  typeeheeks  a  variable.  The  typing  rules  in 
the  right  eolumn  eonstitute  the  type  system  of  the  A-ealeulus  with  produet  types. 

As  a  hypothetieal  proof  ean  be  substituted  into  another  hypothetieal  proof,  a  proof  term  ean  also  be 
substituted  into  another  proof  term.  Suppose  T  E  M  :  A  and  T,x  :  AE  N  :  B.  M  and  N  represent  hypo¬ 
thetieal  proofs  V  and  <5  of  F  h  A  true  and  F,  A  true  F  B  true,  respeetively,  where  we  use  the  same  symbol 
F  for  the  eolleetion  of  anteeedents  eorresponding  to  the  typing  eontext  F.  If  we  replaee  all  oeeurrenees  of  x 
in  N  by  M,  we  obtain  a  proof  term,  written  as  [M/x]N,  whieh  eontains  no  oeeurrenee  of  x.  The  substitution 
prineiple  for  proof  terms  states  that  [M/x]A^  represents  the  hypothetieal  proof  [V/ A  true]£  of  F  h  i?  true: 
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•  (Substitution  principle) 

If  r  h  M  :  yl  and  r,  X  :  ^  h  iV  :  then  T  h  [M/x]N  :  B. 

A  true  and  F  h  ^  true  are  called  synthetic  judgments  because  no  prior  information  on  their  proofs  is 
given  and  we  search  for,  or  synthesize,  their  proofs  from  inference  rules.  In  contrast,  M  :  A  and  T  \-  M  :  A 
are  called  analytic  judgments  because  their  proofs  are  already  represented  in  M  and  can  be  reconstructed 
by  analyzing  M.  To  prove  M  :  AorT  \-  M  :  A  with  typing  rules,  we  only  have  to  analyze  M  because  it 
determines  which  typing  rule  should  be  applied  to  deduce  M  :  A  or  T  \-  M  ■.  A.  For  example,  if  M  is  a 
product  term  (/.e,  M  =  (Mi,  M2)),  a  deduction  of  F  h  M  :  A  always  ends  with  an  application  of  the  typing 
rule  Al.  For  this  reason,  a  deduction  of  M  :  A  or  F  h  M  :  yl  is  often  called  a  derivation  rather  than  a  proof. 

When  we  construct  a  (unique)  derivation  of  M  :  yl  or  F  h  M  :  A,  we  check  if  M  indeed  represents  a 
proof  of  yl  true,  rather  than  searching  for  a  yet  unknown  proof.  Such  a  derivation  effectively  typechecks  M 
by  testing  if  M  indeed  has  type  A,  and  we  call  M  :  A  and  F  h  M  :  yl  typing  judgments. 


Reduction  and  expansion  rules 


All  the  inference  rules  presented  so  far  make  sense  intuitively,  but  their  correctness  is  yet  to  be  established 
in  a  formal  way.  To  this  end,  we  show  that  the  inference  rules  satisfy  two  properties:  local  soundness  and 
local  completeness.  Under  the  Curry-Howard  isomorphism,  the  two  properties  correspond  to  reduction  and 
expansion  rules  for  proof  terms,  thus  culminating  in  a  foundation  for  operational  semantics  of  programming 
languages. 

An  introduction  rule  compresses  the  knowledge  expressed  in  its  premises  into  a  truth  judgment  in  the 
conclusion,  whereas  an  elimination  rule  retrieves  the  knowledge  compressed  within  a  truth  judgment  in  a 
premise  to  deduce  another  judgment  in  the  conclusion.  The  local  soundness  property  states  that  the  knowl¬ 
edge  retrieved  from  a  judgment  by  an  elimination  rule  is  only  part  of  the  knowledge  compressed  within  that 
judgment.  Therefore,  if  the  local  soundness  property  fails,  the  elimination  rule  is  too  strong  in  the  sense 
that  it  is  capable  of  contriving  some  knowledge  that  cannot  be  justified  by  that  judgment.  The  local  com¬ 
pleteness  property  states  that  the  knowledge  retrieved  from  a  judgment  by  an  elimination  rule  includes  at 
least  the  knowledge  compressed  within  that  judgment.  Therefore,  if  the  local  completeness  property  fails, 
the  elimination  rule  is  too  weak  in  the  sense  that  it  is  incapable  of  retrieving  all  the  knowledge  compressed 
within  that  judgment.  If  an  elimination  rule  satisfies  bofh  properfies,  if  refrieves  exacfly  fhe  same  knowledge 
compressed  wifhin  a  judgmenf  in  a  premise. 

We  verify  fhe  local  soundness  properly  by  showing  how  lo  reduce  a  proof  in  which  an  inlroduclion  rule 
is  immediately  followed  by  a  corresponding  eliminafion  rule.  As  an  example,  consider  fhe  following  proof 
for  fhe  conjunclion  conneclive  A: 


V 

A  true 


8 

B  true 


A  A  B  true 


Al 


yl  true 


AE, 


The  eliminafion  rule  A  El  is  nof  foo  sfrong  because  whaf  if  deduces  in  fhe  conclusion,  namely  yl  true,  is  one 
of  fhe  fwo  judgmenfs  used  fo  deduce  A  A  B  true.  Hence  fhe  whole  proof  reduces  lo  a  simpler  proof  V: 

V  8 

A  true  B  true  .  x> 

A  AB  true  ^  ^  A  true 


If  fhe  eliminafion  rule  was  loo  sfrong  {e.g.,  deducing  A  D  B  true  somehow),  fhe  proof  would  nof  be  re- 
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ducible.  As  another  example,  eonsider  the  proof  for  the  implieation  eonneetive  D : 

V 

r,  A  true  h  B  true  c 

— -  ^1  ^ 

T\-  A  Z)  B  true _ F  h  A  true  _ 

r  h  B  true 


By  the  substitution  prineiple,  the  whole  proof  reduees  to  a  simpler  proof  [£/ A  true\D\ 


V 

r,  A  true  h  B  true  c 

— -  ^1  ^ 

T  \-  Ad  B  true _ F  h  A  true 

F  h  i?  true 


[F/A  truejV 
F  h  S  true 


We  refer  to  these  reduetions  =^r  as  local  reductions. 

We  verify  the  loeal  eompleteness  property  by  showing  how  to  expand  a  proof  of  a  judgment  into  another 
proof  in  whieh  one  or  more  elimination  rules  are  followed  by  an  introduetion  rule  for  the  same  judgment. 
As  an  example,  eonsider  a  proof  V  of  A  A  B  true.  The  elimination  rules  AEl  and  AEr  are  not  too  weak 
because  what  they  deduce  in  their  conclusions,  namely  A  true  and  B  true,  are  sufficient  to  reconstruct 
another  proof  of  A  A  i?  true: 

V  V 

X>  AaB  true  .  ^  AaB  true  .  ^ 

A  A  B  true  =^E  A  true _ *"  B  true  .  ^ 

AaB  true 


If  the  elimination  rules  were  too  weak  (e.g.,  being  unable  to  deduce  A  true  somehow),  the  proof  would  not 
be  expandable.  As  another  example,  consider  a  proof  VofTAADB  true.  By  the  weakening  property, 
V  is  also  a  proof  of  F,  A  true  Ad  B  true.  Then  we  can  reconstruct  another  proof  of  Ad  B  true  by 
expanding  V: 

^  -  Hyp 

P  F,  A  true  Ad  B  true  F,  A  true  h  A  true 

T  \-  A  D  B  true  \-  ^  true 

F  h  A  D  i?  true 


We  refer  to  these  expansions  e  as  local  expansions. 

Since  proof  terms  are  essentially  proofs,  local  reductions  and  expansions  induce  reduction  and  expansion 
rules  for  proof  terms: 

fst  (M,  N)  M 

snd  (M,iV)  ^R  N 

{\x:A.M)N  ^R  [N/x\M 

M:AaB  =^e  (fst  M,  snd  M) 

M:AdB  =^e  Xx:A.Mx 

Note  that  these  reduction  and  expansion  rules  preserve  the  type  of  a  given  proof  term.  That  is,  if  M  =^r  N 
or  M  =^E  N,  then  F  h  M  :  A  implies  F  h  iV  :  A.  The  reduction  rules  are  called  the  /3-reduction  rules, 
and  the  expansion  rules  are  called  the  ?7-expansion  rules. 

In  a  programming  language  based  upon  the  A-calculus,  a  program  is  defined  as  a  well-fyped  closed 
proof  ferm,  fhaf  is,  a  proof  ferm  M  such  fhaf  •  h  M  :  A  for  a  cerfain  type  A.  Usually  we  run  a  program 
by  applying  reducfion  rules  under  a  specific  reduction  strategy.  For  example,  fhe  call-by-name  reduction 
sfrafegy  reduces  a  program  (Ax:  A.  M)  N  fo  [N/x\M  (by  fhe  /3-reduction  rule)  regardless  of  fhe  form  of 
term  N.  In  confrasl,  fhe  call-by-value  reduction  sfrafegy  reduces  {Xx:A.M)  N  fo  [N/x]M  only  if  no 
reduction  rule  is  applicable  fo  N  {i.e.,  ty  is  a  value).  Thus  fhe  operational  semantics  of  a  programming 
language  based  upon  fhe  A-calculus  is  specified  by  fhe  reducfion  sfrafegy  for  applying  reduction  rules. 
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2.2.2  Semantics  of  modal  logic 

Modal  logic  is  a  form  of  logic  in  which  truth  may  be  qualified  by  modalities.  Examples  of  modalities 
common  in  the  literature  are  the  necessity  modality  □  and  the  possibility  modality  0-  Informally  “DA  is 
true”  means  “A  is  necessarily  true,”  and  “(}A  is  true”  means  “A  is  possibly  true.”  Thus  modal  logic  is 
more  expressive  than  ordinary  logic  without  modalities,  and  when  applied  to  the  design  of  a  programming 
language,  it  enables  the  type  system  to  specify  richer  properties  that  would  otherwise  be  difficult  to  specify. 

One  popular  way  to  explain  the  semantics  of  modal  logic  is  the  possible  world  interpretation  [35,  71].  It 
assumes  a  set  of  worlds  and  relativizes  truth  to  worlds.  That  is,  instead  of  ordinary  truth  “A  is  true,”  it  uses 
relative  truth  “A  is  true  at  world  uj”  as  the  primitive  notion.  Hence  the  same  proposition  may  be  true  at  one 
world  but  not  at  another  world. 

The  possible  world  interpretation  also  assumes  an  accessibility  relation  <  between  worlds  to  explain  the 
meaning  of  each  modality.  For  example,  the  necessity  and  possibility  modalities  are  defined  as  follows: 

•  DA  is  frue  af  world  uj  if  for  every  world  uj'  accessible  from  uj  (i.e.,  uj  <  uj'),  A  is  frue  af  uj'. 

•  0^  is  true  at  world  w  if  A  is  true  at  some  world  uj'  accessible  from  uj  (i.e.,  uj  <  uj'). 

Ordinary  connectives  (such  as  D  and  A)  are  explained  locally  at  individual  worlds,  irrespective  of  <.  For 
example,  A  Z)  B  is  true  at  world  uj  if  “A  is  true  at  uj”  implies  “B  is  true  at  uj.” 

With  the  above  definition  of  the  modalities  □  and  0,  some  proposition  becomes  true  at  every  world, 
regardless  of  the  accessibility  relation  <.  For  example,  □(A  Z)  B)  Z)  (D^  D  OB)  is  true  at  every  world, 
since  \Il{A  Z)  B)  and  OA  are  sufficient  to  show  that  B  is  true  at  any  accessible  world.  Moreover  various 
systems  of  modal  logic  are  obtained  by  requiring  <  to  satisfy  certain  properties.  The  following  table  shows 
some  properties  of  <  and  corresponding  propositions  that  become  true  at  every  world: 


properly  of  < 

proposilion 

reflexivily  \/uj.  uj  <  uj 

symmelry  Muoiiuj' .  uj  <  uj'  implies  uj'  <  uj 

Iransilivily  \/uj.\/uj' .\/uj" .  uj  <  uj'  and  uj'  <  uj"  imply  uj  <  uj" 

Euclideanness  \/uj.\/uj' .\/uj" .  uj  <  uj'  and  uj  <  uj"  imply  uj'  <  uj" 

□A  D  A 

A  D  noA 
□A  D  DDA 
OA  D  noA 

For  example,  if  <  is  reflexive  and  fransifive,  we  obfain  a  sysfem  of  modal  logic,  usually  referred  fo  as  S4,  in 
which  bofh  IDA  Z)  A  and  OA  Z)  are  frue  af  every  world. 

The  semantics  of  modal  logic  can  also  be  explained  wifhouf  explicifly  using  fhe  nolion  of  world  [62,  8, 
60].  In  fheir  judgmenlal  formulalion  of  modal  logic.  Pfenning  and  Davies  [60]  define  a  validity  judgment 
A  valid  as  •  h  yl  true,  and  inlernalize  A  valid  as  a  modal  proposition  \3A: 

A  valid 
HA  true 

Thus  nA  true  is  inferprefed  as  A  being  frue  af  a  world  abouf  which  we  know  nofhing,  or  equivalenlly,  af 
every  world.  (Note  fhaf  a  judgmenf  is  defined  firsl  and  Ihen  a  corresponding  modality  is  infroduced.)  A 
possibility  judgment  A  pass  is  based  upon  fhe  inferprefalion  of  A  pass  as  A  being  frue  af  a  cerfain  world, 
buf  sfill  ifs  definition  does  nol  use  worlds  explicifly: 

1.  If  r  h  A  true,  fhen  T  h  A  pass. 


2.  If  r  h  A  pass  and  A  true  h  B  pass,  fhen  T  h  H  pass. 
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A  possibility  judgment  A  pass  is  internalized  as  a  modal  proposition  ()A: 

A  pass 
OA  true 

The  possible  world  interpretation  is  rieher  than  the  judgmental  formulation  in  that  some  proposition 
is  true  in  the  possible  world  interpretation  but  not  in  the  judgmental  formulation.  An  example  of  sueh  a 
proposition  is  (0^  2)  DB)  D  □(A  D  B).  It  is  true  in  the  possible  world  interpretation  as  follows;  we  write 
A  @  w  for  A  being  true  at  world  to: 


OA  D  nB  @  UJ,  A  @  u'  h  OA  D  nB  @  u 


Hyp 


OA  D  UB  @  tu,  A  @  w'  h  A  @  w' 
OA  D  DB  @  w,  A  @  w'  h  OA  @  w 


<:)ADnB@u},A@uj'h  B@u}' 

LV  <  U}',  OA  D  UB  @uj\-AdB@uj' 
OA  D  DB  @  w  h  □(A  D  B)@io 
•  h  (OA  D  DB)  D  n(A  D  B)@io 


D\ 

□  I 


Dl 


Hyp 

01 

de 


Its  truth  is,  however,  not  provable  in  the  judgmental  formulation: 

??? 

_ ■  \-  A  D  B  true _ 

OA  D  OB  true  h  □(A  D  i?)  true 
■  h  (OA  D  UB)  D  □(A  D  B)  true 


In  a  eertain  sense,  the  possible  world  interpretation  is  inherently  more  expressive  than  the  judgmental 
formulation  beeause  it  explieitly  speeifies  the  world  at  which  a  proposition  is  true.  On  the  other  hand,  it  may 
not  be  a  good  basis  for  the  type  system  of  a  programming  language,  since  the  use  of  the  accessibility  relation 
in  proofs  implies  that  the  type  system  also  needs  to  reason  about  the  relation  between  worlds,  which  can  be 
difficult  depending  on  the  concrete  notion  of  world  chosen  by  the  type  system.  The  judgmental  formulation 
lends  itself  well  to  this  purpose  because  it  does  not  use  worlds  explicitly  in  the  inference  rules. 

The  logic  for  Aq  combines  the  possible  world  interpretation  and  the  judgmental  style  by  assuming  an 
accessibility  relation  between  worlds  and  relativizing  all  judgments  to  worlds.  For  example,  it  uses  a  truth 
judgment  of  the  form  A  true  @  a;  to  mean  that  A  is  true  at  world  to.  Its  inference  rules,  however,  do  not 
use  judgments  showing  accessibility  between  two  worlds,  as  is  the  case  in  the  judgmental  formulation  of 
modal  logic  (see  Simpson  [71]  for  a  system  of  modal  logic  which  uses  such  judgments  in  inference  rules). 
Instead  it  requires  the  accessibility  relation  to  satisfy  a  certain  condition  (monotonicity),  which  eliminates 
the  need  for  such  judgments  in  inference  rules.  Since  the  possible  world  interpretation  in  Aq  is  to  use  the 
same  worlds  that  are  part  of  the  run-time  system,  lack  of  such  judgments  in  inference  rules  implies  that  the 
type  system  of  Aq  does  not  explicitly  model  changes  in  the  run-time  system,  as  is  the  case  in  a  typical  type 
system. 


2.3  Language  Ao 

Pfenning  and  Davies  [60]  present  a  monadic  language  which  reformulates  Moggi’s  monadic  metalanguage 
Xjni  [44,  45].  It  applies  the  Curry-Howard  isomorphism  to  lax  logic  formulated  in  the  judgmental  style  (with 
a  lax  truth  judgment  A  lax): 

1.  If  r  h  A  true,  then  T  h  A  lax. 
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2.  If  r  h  ^  lax  and  F,  A  true  h  B  lax,  then  F  h  B  lax. 

Xq  is  essentially  the  monadie  language  of  Pfenning  and  Davies  eoaleseed  with  the  possible  world  in¬ 
terpretation.  The  differenee  is  that  in  Ao,  the  definition  of  eaeh  judgment  relies  only  on  truth  and  the 
aeeessibility  relation,  instead  of  elauses  deseribing  its  properties  (sueh  as  the  above  two  elauses).  In  other 
words,  the  definition  of  eaeh  judgment  direetly  eonveys  its  intuitive  meaning. 

2.3.1  Logic  for  Ao 

The  development  of  Ao  begins  by  formulating  the  logie  for  Ao-  Sinee  the  logie  for  Ao  uses  the  possible 
world  interpretation,  we  first  define  an  aeeessibilify  relation  <  befween  worlds.  Now  a  world  refers  fo  fhe 
same  nofion  fhaf  deseribes  parf  of  fhe  run-time  sysfem. 

Definition  2.1.  A  world  u)'  is  accessible  from  another  world  to,  written  as  to  <  oj',  if  there  exists  a  world 
effect  that  causes  a  transition  from  uj  to  to'. 

As  if  deseribes  fransifions  befween  worlds  when  world  effeels  are  produeed,  fhe  aeeessibilify  relafion  < 
is  a  temporal  relafion  befween  worlds.  If  tu  <  uj' ,  we  say  fhaf  uj'  is  a  fulure  world  of  uo  and  fhaf  w  is  a  pasf 
world  of  uj' .  Note  fhaf  <  is  reflexive  and  fransifive,  sinee  a  vaeuous  world  effeel  eauses  a  fransifion  fo  fhe 
same  world  and  fhe  eombinafion  of  fwo  world  effeels  ean  be  regarded  as  a  single  world  effeel. 

The  logie  for  Ao  uses  fwo  kinds  of  basie  judgmenls,  bolh  of  whieh  are  relalivized  fo  worlds: 

•  A  truth  judgment  A  true  @  uj  means  fhaf  A  is  Irue  al  world  uj. 

•  A  computability  judgment  A  comp  @  uj  means  fhaf  A  is  Irue  al  some  fulure  world  of  uj,  fhaf  is, 
A  true  @  uj'  holds  where  uj  <  uj'. 

A  Irulh  judgmenl  A  true  @  uj  represenls  a  known  faef  aboul  world  uj.  Sinee  a  fulure  world  ean  be  reaehed 
only  by  produeing  some  world  effeel,  a  eompulabilily  judgmenl  A  comp  @  uj  may  be  interpreted  as  meaning 
fhaf  A  becomes  Irue  after  producing  some  world  effeel  al  world  uj. 

The  following  properties  of  hypolhelical  judgmenls  characterize  Irulh  judgmenls,  where  J  is  eilher  a 
Irulh  judgmenl  or  a  eompulabilily  judgmenl: 

Characterization  of  truth  judgments 

1.  F,  A  true  @  w  h  A  true  @  uj. 

2.  If  F  h  A  true  @  uj  and  F,  A  true  @  uj\-  J,  then  F  h  J. 

The  first  clause  expresses  that  A  true  @  uj  may  be  used  as  a  hypothesis.  The  second  clause  expresses  the 
substitution  principle  for  truth  judgments. 

The  definition  of  computability  judgments  gives  the  following  characterization,  which  is  an  adaptation 
of  the  characterization  of  lax  truth  for  the  possible  world  interpretation: 

Characterization  of  computability  judgments 

1.  If  F  h  A  true  @  uj,  then  F  h  A  eomp  @  to. 

2.  If  F  h  A  eomp  @  uo  and  F,  A  true  @  uj'  \-  B  comp  @  uj'  for  any  world  uo'  such  that  uo  <  uo' , 
then  F  h  B  comp  @  uj. 
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The  first  clause  expresses  that  if  A  is  true  at  cv,  then  A  becomes  true  without  producing  any  world  effect  at 
io.  It  follows  from  the  reflexivity  of  <:  if  ^  true  @  tu  holds,  then  A  is  true  at  tu,  which  is  accessible  from  u 
itself,  and  hence  A  comp  @  uj  holds.  The  second  clause  expresses  that  if  A  is  true  at  uj'  after  producing  some 
world  effect  at  uj,  we  may  use  A  true  @  uj'  as  a  hypothesis  in  deducing  a  judgment  at  uj' .  If  the  judgment  at 
uj'  is  a  computability  judgment  B  comp  @  uj' ,  the  transitivity  of  <  allows  us  to  deduce  B  comp  @  uj: 

Proof  of  the  second  clause.  Assume  that  A  comp  @  uj  implies  A  true  @  uji  where  uj  <  uji.  We  prove  B  comp  @  uj 
from  hypotheses  T  as  follows: 

A  comp  @  UJ  holds  because  T  h  A  comp  @  uj. 

A  true  @  uji  holds  by  the  assumption  on  A  comp  @  uj. 

B  comp  @  uji  holds  because  T,  A  true  @  h  B  comp  @  uji. 

B  true  @  UJ2  holds  for  some  world  uj2  such  that  uji  <  uj2  (by  the  definition  of  B  comp  @  ujf). 

B  comp  @  UJ  holds  because  uj  <  uj2  by  the  transitivity  of  <  (i.e.,  uj  <  uji  <  UJ2).  □ 

We  use  the  second  clause  as  the  substitution  principle  for  computability  judgments. 

Monotonicity  of  the  accessibility  relation  < 

We  intend  to  use  world  effects  for  accumulating  more  knowledge,  but  not  for  discarding  existing  knowledge. 
Informally  a  world  effect  causes  a  transition  to  a  world  where  more  facts  are  known  and  more  world  effects 
can  be  produced.  The  monotonicity  of  the  accessibility  relation  <  formalizes  our  intention  to  use  world 
effects  only  for  accumulating  more  knowledge: 

Definition  2.2.  The  accessibility  relation  <  is  monotonic  if  for  two  worlds  uj  and  uj'  such  that  uj  <  uj'  , 

1)  A  true  @  UJ  implies  A  true  @  uj'; 

2)  Ai  true  @  uj,  -  ■  ■  ,  An  true  @  tu  h  A  comp  @  uj  implies  Ai  true  @  uo' ,  -  ■  ■  ,  A„  true  @  w'  h  A  comp  @  uj'. 

The  first  condition,  monotonicity  of  truth,  states  that  a  future  world  inherits  all  facts  known  about  its  past 
worlds.  It  proves  two  new  properties  of  hypothetical  judgments: 

1.  If  r  h  A  true  @  u)  and  uo  <  uj' ,  then  T  h  A  true  @  uj' . 

2.  If  r,  A  true  @  uj'  \-  J  and  uj  <  uo',  then  T,  A  true  @  tu  h  J. 

The  second  condition,  persistence  of  computation,  states  that  a  world  effect  that  can  be  produced  at  world 
UJ  under  some  facts  (about  uo)  can  be  reproduced  at  any  future  world  uj'  under  equivalent  facts  (about  uj'). 

Unlike  monotonicity  of  truth,  it  uses  hypothetical  judgments  in  which  all  antecedents  are  truth  judgments  at 
the  same  world  as  the  succedent.  The  reason  is  that  a  world  effect  may  require  some  facts  about  the  world 
at  which  it  is  produced  {e.g.,  allocating  a  new  reference  requires  an  argument  for  initializing  a  new  heap 
cell),  and  its  corresponding  computability  judgments  at  different  worlds  can  be  compared  for  persistence 
only  under  equivalent  facts  about  individual  worlds. 

Note  that  monotonicity  of  truth  does  not  imply  persistence  of  computation.  For  example,  if  A  comp  @  uj 
holds  because  A  true  @  uj'  where  uo  <  uj',  monotonicity  of  truth  allows  us  to  conclude  A  comp  @  uj"  for 
every  world  uj"  accessible  from  uj',  but  not  for  every  world  accessible  from  uj. 

Simplified  form  of  hypothetical  judgment 

In  principle,  a  hypothetical  judgment  T  h  J  imposes  no  restriction  on  antecedents  T  and  succedent  J.  That 
is,  if  J  is  a  judgment  at  world  uj,  then  T  may  include  both  truth  judgments  and  computability  judgments 
at  world  uj  itself,  past  worlds  of  uj,  future  worlds  of  uj,  or  even  those  worlds  unrelated  to  uo.  Thus  such  a 
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general  form  of  hypothetieal  judgment  allows  us  to  express  reasoning  about  not  only  the  present  but  also  the 
past  and  future. 

Examples  of  reasoning  about  the  past  and  future  are: 

•  If  there  has  been  a  transaetion  failure  in  a  database  system  in  the  past,  we  ereate  a  log  file  now. 

•  If  the  program  has  produeed  no  output  yet,  we  stop  taking  input. 

•  If  the  heap  eell  is  dealloeated  in  the  future  and  beeomes  no  longer  available,  we  make  a  eopy  of  it 
now. 

•  If  the  program  is  to  open  the  file  evenfually,  we  do  nol  elose  if. 

Sinee  we  intend  fo  use  Aq  only  fo  reason  abouf  fhe  presenf,  fhe  logie  for  Aq  imposes  resfriefions  on  an- 
leeedenfs  in  hypofhefieal  Judgmenfs  and  uses  a  simplified  form  of  hypofhefieal  judgmenf  as  deseribed  below. 

Firsf  fhe  simplified  form  uses  as  anfeeedenfs  only  frufh  judgmenfs.  If  a  eompufabilify  judgmenf  is  fo 
be  exploited,  we  use  as  an  anfeeedenf  a  frufh  judgmenf  fhaf  if  asserfs,  as  shown  in  fhe  seeond  elause  of 
fhe  eharaeferizafion  of  eompufabilify  judgmenfs.  Seeond  fhe  simplified  form  uses  only  judgmenfs  af  fhe 
same  world.  In  ofher  words,  a  hypofhefieal  proof  reasons  abouf  one  presenf  world  and  does  nol  eonsider 
ils  relalion  fo  pasl  and  fulure  worlds  (or  unrelated  worlds).  The  rafionale  for  fhe  seeond  simplifiealion  is 
Iwo-fold: 

1.  Faefs  abouf  pasl  worlds  aulomalieally  beeome  faefs  abouf  fhe  presenf  world  by  fhe  monolonieily  of 
<.  Therefore  Ihere  is  no  reason  fo  eonsider  faefs  abouf  fhe  pasl. 

2.  In  general,  faefs  abouf  fulure  worlds  are  unknown  fo  fhe  presenf  world  beeause  of  fhe  lemporal  nafure 
of  <.  If  we  were  fo  supporl  reasoning  abouf  fulure  worlds,  fhe  neeessily  and  possibility  modalilies 
would  be  neeessary. 

Thus  fhe  logie  for  Aq  uses  fhe  following  Iwo  forms  of  hypofhefieal  judgmenfs: 

•  Ai  true  @  CO,  -  ■  ■  ,An  true  @  a;  h  ^  true  @  uj, 

whieh  is  abbrevialed  as  Ai  true,  ■  ■  ■  ,  An  true  hg  A  true  @  tu. 

•  Ai  true  @  w,  •  •  •  ,  An  true  @  a;  h  ^  comp  @  uj, 

whieh  is  abbrevialed  as  Ai  true,  ■  ■  ■  ,  An  true  hg  A  comp  @  uj. 

As  fhe  logie  for  Aq  requires  only  fhe  simplified  form  of  hypofhefieal  judgmenf,  we  simplify  fhe  eharae- 
lerizalion  of  frufh  and  eompufabilify  judgmenfs  aeeordingly.  The  new  eharaeferizafion  of  frufh  judgmenfs  is 
jusl  a  speeial  ease  of  fhe  previous  eharaeferizafion: 

Characterization  of  truth  judgments  with  T  hs  J 

1.  r,  A  true  hs  A  true  @  tu. 

2.  If  r  hs  A  true  @  tu  and  T,  A  true  hs  J,  then  T  h  J,  where  J  is  a  judgment  at  world  uj. 

The  new  eharaeferizafion  of  eompufabilify  judgments  does  not  eonsider  transitions  between  worlds: 

Characterization  of  computability  judgments  with  T  hs  J 

1.  If  r  hs  A  true  @  uj,  then  T  hs  A  comp  @  uj. 

2.  If  r  hs  A  comp  @  UJ  and  T,  A  true  hs  B  comp  @  uj,  then  T  hs  B  comp  @  uj. 
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Proof  of  the  second  clause.  Given  F  =  true,  ■  ■  ■  ,An  true,  we  write  F  @  cj  for  Ai  true  @  co,  -  ■  ■  , 
An  true  @  to.  Assume  F  @  a;  h  A  comp  @  to  and  F  @  tu,  A  true  @  to  B  comp  @  to.  For  any  world  to' 
sueh  that  to  <  to', 

F  @  w',  A  true  @  to'  B  comp  @  to'  holds  by  persistenee  of  eomputation; 

F  @  w,  A  true  @  to'  \-  B  comp  @  to'  holds  by  monotonieity  of  truth. 

Then  F  @  tu  F  77  comp  @  tu,  or  F  hs  77  comp  @  to,  holds  by  the  substitution  prineiple  for  eomputability 
judgments.  □ 

Note  that  in  the  seeond  elause,  A  comp  @  to  leads  to  (as  a  new  hypothesis)  a  truth  judgment  at  the 
same  world  instead  of  a  future  world.  That  is,  even  if  A  comp  @  to  holds  beeause  A  true  @  to'  where 
to  <  to' ,  we  use  as  a  new  hypothesis  A  true  @  to  instead  of  A  true  @  to'.  Thus  we  reason  as  if  the  world 
effeet  eorresponding  to  A  comp  @  to  did  not  eause  a  transition  to  the  future  world  to' .  By  virtue  of  the 
monotonieity  of  <,  this  reasoning  provides  a  simple  way  to  test  77  comp  @  to"  for  every  future  world  to"  of 
to,  as  in  the  previous  eharaeterization  of  eomputability  judgments.  The  seeond  elause  allows  the  type  system 
of  Ao  to  typeeheek  a  program  produeing  a  sequenee  of  world  effeets  without  aetually  produeing  them,  as 
will  be  seen  in  the  next  subseetion. 

2.3.2  Language  constructs  of  Ao 

To  represent  proofs  of  judgments,  we  use  two  syntaetie  eategories:  terms  M,  N  for  truth  judgments  and 
expressions  E,  F  for  eomputability  judgments.  Thus  the  Curry-Howard  isomorphism  gives  the  following 
eorrespondenee,  where  typing  judgments  are  annotated  with  worlds  where  terms  or  expressions  reside: 

A  true  @0  M  .  A  @  to  ^  comp  @0  E  —  A  @  to 

That  is,  we  represent  a  proof  17  of  A  true  @  w  as  a  term  M  of  type  A  at  world  to,  written  as  M  :  A  @  o, 
and  a  proof  F  of  A  comp  @  o  as  an  expression  E  of  type  A  at  world  to,  written  as  E  A  @  to.  Analogously 
hypothetieal  judgments  (of  the  form  F  hs  J)  eorrespond  to  typing  judgments  with  typing  eontexts: 

ThsM  :  A  @  to  ThsE^A©  to 

A  typing  eontext  F  is  a  set  of  bindings  x  :  A: 

typing  eontext  F  ::=  •  |  F,x  :  A 

X  :  A  in  F  means  that  variable  x  assumes  a  term  that  has  type  A  at  a  given  world  {i.e.,  world  to  in 
FhsM  :A@worFI^£'-^A@a;)  but  may  not  typeeheek  at  other  worlds.  Then  a  term  typing  judg¬ 
ment  F  hs  M  :  A  @  tu  means  that  M  has  type  A  at  world  w  if  F  is  satisfied  at  the  same  world;  similarly  an 
expression  typing  judgment  F  77  A  @  tu  means  that  E  has  type  A  at  world  w  if  F  is  satisfied  at  the  same 
world.  Alternatively  we  may  think  ofFI^M:A@tuorFI-s77-^A@was  typing  judgments  indexed  by 
worlds. 

Terms  and  expressions  form  separate  sublanguages  of  Ao-  Their  differenee  is  manifest  in  the  opera¬ 
tional  semanties  of  Aq,  whieh  draws  a  distinetion  between  evaluations  of  terms,  involving  no  worlds,  and 
computations  of  expressions,  involving  transitions  between  worlds: 

M  E  @  to  @  to' 

A  term  evaluation  M  ^  V  does  not  internet  with  the  world  where  term  M  resides;  henee  the  resultant 
value  V  resides  at  the  same  world.  In  eontrast,  an  expression  eomputation  E  @  to  —r  V  @  to'  may  internet 
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type  A,  B 

term  M,  N 

expression  E,  F 
value  V 


AZi  A\0  A 

X  \  \x\  A.  M  \  M  M  \  E 
M  I  letcmp  X  <\M  \n  E 
Xx:A.M  I  cmp  E 


Figure  2.2:  Abstract  syntax  for  Ao- 

T,x  :  AhM  :  B  @uj 

r,  X  :  ^  hs  X  :  ^  @  (j  Th\x:A.M  ■.  AZ)  B  @uj  ^ 

T  h  Ml  :  A  ZZ  B@uj  ThM-2:A@uj  ThE^A@uj 

r  \-s  Ml  M2  :  B  @  u!  ^  r  cmp  E  :  OA  @  oj 

T  hs  M  :  A  @  CO  rhsM:  OA  @  co  r,x  :  A\-s  E  -z  B  @  co 

ThsM^A©  CO  letcmp  x  <  M  \n  E  -z  B 


Figure  2.3:  Typing  rules  of  Ao- 


with  world  co  where  expression  E  resides,  eausing  a  transition  to  another  world  u';  henee  the  resultant 
value  V  may  not  reside  at  the  same  world.  Thus  term  evaluations  are  always  effeet-free  whereas  expression 
eomputations  are  potentially  effeetful  (with  respeet  to  world  effeets). 

Note  that  worlds  are  required  by  both  the  type  system  and  the  operational  semanties  of  Aq-  That  is, 
worlds  are  both  eompile-time  objeets  and  run-time  objeets  in  the  definition  of  Ao-  As  worlds  are  involved 
in  expression  eomputations  and  henee  definitely  serve  as  run-time  objeets,  one  eould  argue  that  abstraetions 
of  worlds  rather  than  worlds  themselves  (e.g.,  store  typing  eontexts  rather  than  stores)  are  more  appropriate 
for  the  type  system.  Our  view  is  that  worlds  are  aeeeptable  to  use  in  the  type  system  for  the  same  reason 
that  terms  and  expressions  appear  in  both  the  type  system  and  the  operational  semanties:  the  type  system 
determines  statie  properties  of  terms  and  expressions,  and  the  operational  semanties  deseribes  how  to  reduee 
terms  and  expressions;  likewise  the  type  system  determines  statie  properties  of  worlds  (with  respeet  to  terms 
and  expressions),  and  the  operational  semanties  deseribes  transitions  between  worlds. 

Ineidentally  the  type  system  of  Ao  is  designed  in  sueh  a  way  that  only  an  initial  world  at  whieh  the 
run-time  system  starts  (e.g.,  an  empty  store)  is  required  for  typeeheeking  any  program.  Henee  no  praetieal 
problem  arises  in  implementing  the  type  system  as  we  ean  simply  disregard  worlds. 

Below  we  introduee  all  term  and  expression  eonstruets  of  Aq-  Figure  2.2  summarizes  the  abstraet  syntax 
for  Ao-  Figure  2.3  summarizes  the  typing  rules  of  Ao-  We  use  x,  y,  z  for  variables. 


Term  constructs 

As  terms  represent  proofs  of  truth  judgments,  the  eharaeterization  of  truth  judgments  gives  properties  of 
terms  when  interpreted  via  the  Curry-Howard  isomorphism.  The  first  elause  gives  the  following  rule  where 
variable  x  is  used  as  a  term: 

r,  X  :  A  hs  X  :  A  @  (j 

The  seeond  elause  gives  the  substitution  prineiple  for  terms: 

Substitution  principle  for  terms 

IfThM:A@coandr,x:AhN:B@co,  then  F  [M/x]A^  :  B  @  co. 

IfT  M  :  A  @  o  and  T  ,x  :  AV-^  E  -z  B  @  co,  then  F  h;  [M/x\E  -z  B  @  co. 
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[M/x\N  and  [M/x\E  denote  eapture-avoiding  term  substitutions  whieh  substitute  M  for  all  oeeurrenees 
of  X  in  and  E.  We  will  give  the  definition  of  term  substitution  after  introdueing  all  term  and  expression 
eonstruets. 

We  apply  the  Curry-Howard  isomorphism  to  truth  judgments  by  introdueing  an  implieation  eonneetive 
D  sueh  that  F  hg  A  D  i?  true  @  uj  expresses  F,  A  true  hg  B  true  @  to.  It  gives  the  following  introduetion 
and  elimination  rules,  where  we  use  a  lambda  abstraetion  Xx:A.M  and  a  lambda  applieation  Mi  M2  as 
terms: 

T,x  :  AV-sM  :  B  @oj  ThMi:AZ}B@uj  FhsMa;^®^^ 

Fhs  \x:A.M  :AdB@u}  ^  Th  Mi  M2  :B  ©to 

We  use  a  reduetion  relation  ^/jterm  in  both  the  term  reduetion  rule  for  D  and  its  eorresponding  proof 
reduetion: 

(Ax:AiV)  M  [M/x]iV  ih) 

r,x:  AhN  :  B  @L0 

ThXx:A.N  :  Ad  B  @uj  ThsM:A@uj 

Ths{Xx:A.N)  M  :  B  @L0  ^  ^/3term 

F  hs  [M/x]A^  :B@lo 


Expression  constructs 

Similarly  to  truth  judgments,  we  begin  by  interpreting  the  eharaeterization  of  eomputability  judgments  in 
terms  of  typing  judgments.  The  first  elause  means  that  a  term  of  type  A  is  also  an  expression  of  the  same 
type: 

ThsM  :A@  to 
ThM^A@u 

The  seeond  elause  gives  the  substitution  prineiple  for  expressions: 

Substitution  principle  for  expressions 

^F  E  A  @  w  and  T,x  :  A\-s  E  B  @  uj,  then  F  1^  {E/x)E  -D  B  @  uj. 

Unlike  a  term  substitution  [M/x]E  whieh  analyzes  the  strueture  of  E,  an  expression  substitution  {E/x)F 
analyzes  the  strueture  of  E  instead  of  F.  This  is  beeause  {E/x)F  is  intended  to  ensure  that  both  E  and 
F  are  eomputed  exaetly  onee  and  in  that  order:  first  we  eompute  E  to  obtain  a  value;  then  we  proeeed  to 
eompute  F  with  x  bound  to  the  value.  Therefore  we  should  not  replieate  E  within  F  (at  those  plaees  where 
X  oeeurs),  whieh  would  result  in  eomputing  E  multiple  times.  Instead  we  should  eoneeptually  replieate 
F  within  E  (at  those  plaees  where  the  eomputation  of  E  finishes)  so  fhaf  fhe  whole  eompufafion  ends  up 
eomputing  both  E  and  F  only  onee.  In  this  sense,  an  expression  substitution  {E/x)F  substitutes  not  E 
into  F,  but  F  into  E.  We  will  give  the  definition  of  expression  substitution  after  introdueing  all  expression 
eonstruets. 

We  apply  the  Curry-Howard  isomorphism  to  eomputability  judgments  by  internalizing  A  comp  @  uj 
with  a  modality  O  so  that  F  hg  OA  true  @  uj  expresses  F  hj  yl  comp  @  uj.  The  introduetion  and  elimination 
rules  use  a  computation  term  cmp  E  and  a  bind  expression  letcmp  x  <  M  \n  E: 

F  hi  E yl  @  w  T  \-s  M  :  OA  @  UJ  T,x:A\-sE-^B@uj 

F  hi  cmp  E  :  OA  @  tu  F  hs  letcmp  x<M\nE-\rB@u} 

We  use  a  reduetion  relation  ^/3exp  in  both  the  expression  reduetion  rule  for  O  and  its  eorresponding  proof 
reduetion: 


letcmp  X  <1  cmp  E  in  P  ^/3exp  {E/x)F  (/^o) 
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ThsE^A©  CO 

ThcmpE  :OA@  CO  r,x:AhE^B@co 

r  letcmp  X  <1  cmp  in  F i?  @  (j  ^/3exp 

r  hs  {E/x)E  -^B@u 

cmp  F  denotes  the  eomputation  of  F,  but  does  not  aetually  eompute  F;  henee  we  say  that  cmp  F  encapsu¬ 
lates  the  eomputation  of  F.  letcmp  x  <  M  \n  E  enables  us  to  sequenee  two  eomputations  (if  M  evaluates  to 
a  eomputation  term). 

Note  that  the  typing  rule  OE  does  not  aeeurately  refleet  the  operational  behavior  of  letcmp  x  <  M  in  F. 
Speeifieally,  while  the  rule  OE  typeeheeks  F  at  the  same  world  co  that  it  typeeheeks  M,  the  eomputation  of 
F  may  take  plaee  at  a  different  world  co'  (where  co  <  co')  beeause  of  an  expression  eomputation  preeeding 
the  eomputation  of  F.  Nevertheless  it  is  a  sound  typing  rule  beeause  the  monotonieity  of  the  aeeessibility 
relation  <  allows  the  type  system  to  reason  as  if  a  world  effeet  did  not  eause  a  transition  to  another  world, 
as  elarified  in  the  eharaeterization  of  eomputability  judgments. 

Computation  terms  and  bind  expressions  may  be  thought  of  as  monadie  eonstruets,  sinee  the  modality 
O  forms  a  monad.  In  Haskell  syntax,  the  monad  eould  be  written  as  follows: 

instance  Monad  O  where 
return  M  =  cmp  M 
M  »=  N  =  cmp  letcmp  x<]  Min 
letcmp  y  <1  X  in 

y 

The  above  definition  satisfies  the  monadie  laws  [77],  modulo  the  expression  reduetion  rule  /3q  and  a  term 
expansion  rule  jq  for  the  modality  O: 

M  =^tjexp  cmp  letcmp  X  <1 M  in  X  (70) 

However,  onee  we  introduee  a  fixed  point  eonstruet  for  terms,  the  rule  7q  beeomes  invalid.  For  example, 
if  M  is  a  fixed  point  eonstruet  whose  reduetion  never  terminates,  its  expansion  into  cmp  letcmp  x  <1  M  in  x 
is  not  justified  beeause  the  reduetion  of  the  expanded  term  immediately  terminates.  Henee  the  modality  O 
ceases  to  form  a  monad,  and  we  do  not  call  Ao  a  monadic  language. 

2.3.3  Substitutions 

Now  that  all  term  and  expression  constructs  have  been  introduced,  we  define  term  and  expression  substitu¬ 
tions.  We  first  consider  term  substitutions,  which  are  essentially  textual  substitutions. 


Term  substitution 

Term  substitutions  [M/x\N  and  [M/x]F  are  straightforward  to  define  as  they  correspond  to  substituting 
a  proof  of  A  true  @  w  for  a  hypothesis  in  a  hypothetical  proof.  To  formally  define  term  substitutions,  we 
need  a  mapping  FV{-)  for  obtaining  the  set  of  free  variables  in  a  given  term  or  expression;  a  free  variable 
is  one  that  is  not  bound  in  lambda  abstractions  and  bind  expressions: 


FV{x) 

FV{\x:A.M) 

FV{Mi  M2) 

FV (cmp  F) 

FV (letcmp  x  <  M  \n  E) 


{x} 

FV{M)  -  {x} 

FV{Mi)  U  FV{M2) 
FV{E) 

FV{M)U{FV{E)  -{x}) 
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Figure  2.4:  A  schematic  view  of  {E/x)F. 


In  the  definition  of  [M/x]N  and  [M/x]E,  we  implicitly  rename  bound  variables  in  N  and  E  as  necessary 
to  avoid  the  capture  of  free  variables  in  M:^ 


[M/x]y 

[M/x]Xy:A.N 
[M/x]{N,N2) 
[M/x]cmp  E 
[M/x]letcmp  y  <  N  \n  E 


M  X  =  y 

y  otherwise 

Xy :  A.  [M/x]N  x^y,y^  FV{M) 

[M/x]Ni  [M/x\N2 

cmp  [M/x\E 

letcmp  y  o  [M/x]N  in  [M/x\E  x  y,y  ^  FV (M) 


The  above  definition  of  term  substitution  conforms  to  the  substitution  principle  for  terms: 

Proposition  2.3  (Substitution  principle  for  terms). 

IfThM:A@ujandT,x:AhsN:B@uj,  then  T  h  [M/x\N  :  B  @  to. 

IfT  h;  M  :  A  @  u;  and  T ,  x  :  A  \-s  E  -\r  B  @  to,  then  T  [M/x]E  -tr  B  @  ui. 

Proof.  By  simultaneous  induction  on  the  structure  of  N  and  E.  □ 


Proposition  2.3  implies  that  term  reductions  by  are  indeed  type-preserving: 

Corollary  2.4  (Type  preservation  of  =^»/3term)" 

IfT  hs  {Xx:A.N)  M  :  B  @  to,  then  T  h  [M/x]N  :  B  @  to. 

''Hence  a  term  substitution  does  not  need  to  be  defined  in  all  cases. 
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Expression  substitution 

Given  F  hs  E  ^  @  w  and  r,x  :  A  \-s  F  B  @  to,  an  expression  substitution  eombines  the  two  typing 
judgments  by  finding  an  expression  {E/x)F  sueh  that  F  hs  {E/x)F  B  @  to.  It  eorresponds  to  substituting 
a  hypothetieal  proof  using  A  true  @  tu  as  a  hypothesis  into  a  proof  of  A  comp  @  to. 

Figure  2.4  shows  a  sehematie  view  of  an  expression  substitution  {E/x)E.  Expression  E  eontains  a  term 
M  of  type  A  whieh  ultimately  determines  its  type.  For  example,  E  =  letcmp  x  <\N  \  n  M  has  the  same 
type  as  M,  and  if  M  is  replaeed  by  another  expression  E'  of  type  A',  the  resultant  expression  also  has  type 
A' .  Operationally  the  eomputation  of  E  finishes  by  evaluafing  M.  Expression  F  eonfains  variable  x  whieh 
eorresponds  fo  a  hypofhesis  A  true  @  w  in  a  hypofhefieal  proof  of  B  comp  @  to.  {E/x)F  firsl  subsfifufes 
M  for  x  in  F,  whieh  resulfs  in  a  new  expression  [M/x\F  of  fype  B\  fhen  if  replaees  M  in  E  by  [M/x\F. 
In  fhis  way,  {E/x)F  subslifufes  F  info  E,  rafher  fhan  E  info  F.  Note  fhaf  allhough  {E/x)F  Iransforms  Ihe 
slruelure  of  E,  if  has  Ihe  same  type  as  E  beeause  ils  fype  is  ulfimalely  determined  by  whalever  expression 
replaees  M. 

Thus  {E/x)F  analyzes  Ihe  slruelure  of  E,  instead  of  F,  lo  find  a  term  lhal  ullimalely  determines  Ihe 
type  of  E: 

{M/x)F  =  [M/x]E 

{\etcmp  y  <  M  In  E' / x) E  =  letcmp  y  <  M \n  {E' / x) F 
The  above  definition  of  expression  subslilulion  eonforms  lo  Ihe  subslilulion  prineiple  for  expressions: 

Propositiou  2.5  (Substitutiou  principle  for  expressions). 

^F  \-s  E  A  @  to  and  T ,  x  :  A  \-s  F  B  @  to,  then  F  ty  {E/x)F  B  @  to. 

Proof.  By  induelion  on  Ihe  slruelure  of  E  (nol  E).  □ 

Proposition  2.5  implies  lhal  expression  reduelions  by  ^/jexp  are  indeed  type-preserving: 

Corollary  2.6  (Type  preservation  of  ^/jexp)* 

Ifr  hs  letcmp  X  a  cmp  E  \n  F  B  @  to,  then  F  hs  {E/x)F  B  @  to. 

2.3.4  World  terms  and  instructions 

The  operational  semanties  of  Ao  provides  rules  for  term  evaluations  M  ^  V  and  expression  eomputations 
E  @  to  ^  V  @  to'.  For  term  evaluations,  we  introduee  a  term  reduetion  M  i-^t  Ff  sueh  that  M  V  is 
identified  with  M  ^  V,  where  is  the  reflexive  and  transitive  elosure  of  for  expression  eomputations, 

we  introduee  an  expression  reduetion  E  @  to  i-^e  E  @  to'  sueh  that  E  @  to  F  @  a;'  is  identified  with 
E  @  to  ^  V  @  to',  where  is  the  reflexive  and  transitive  elosure  of 

M^;V  iff  M^V  E  @  to  to'  iff  E@to^V@to' 

At  this  point,  there  is  no  language  eonstruet  for  produeing  world  effeets  and  no  typing  rules  and  reduetion 
rules  aetually  require  worlds.  That  is,  all  language  eonstruets  introdueed  so  far  are  purely  logieal  in  that  their 
definition  is  explained  either  by  properties  of  judgments  {e.g.,  variables,  inelusion  of  terms  into  expressions) 
or  by  introduetion  and  elimination  rules  (e.g.,  lambda  abstraetions,  lambda  applieations).  In  faet,  if  we  erase 
@  to  from  typing  judgments,  Aq  reverts  to  the  monadie  language  of  Pfenning  and  Davies  [60].  Thus  we 
introduee  language  eonstruets  for  interaeting  with  worlds  before  presenting  the  operational  semanties. 

We  use  instructions  I  as  expressions  for  produeing  world  effeets.  As  an  interfaee  to  worlds,  they  are 
provided  by  the  programming  environment.  For  example,  an  instruetion  new  M  for  alloeating  new  refer- 
enees  produees  a  world  effeet  by  eausing  a  ehange  to  the  store,  and  returns  a  referenee.  An  instruetion  may 
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have  arguments,  and  term  substitution  on  instruetions  with  arguments  is  defined  in  a  struetural  way;  henee 
Proposition  2.3  eontinues  to  hold. 

We  refer  to  those  objeets  originating  from  worlds,  sueh  as  referenees,  as  world  terms  W.  Sinee  they 
eannot  be  deeomposed  into  ordinary  terms,  world  terms  are  assumed  to  be  atomie  values  (eontaining  no 
subterms)  and  are  given  speeial  world  term  types  W.  For  example,  referenee  type  ref  ^  is  a  world  term  type 
for  referenees.  Note  that  while  world  terms  may  not  eontain  ordinary  terms,  world  term  types  may  eontain 
ordinary  types  {e.g.,  ref  A). 

The  new  abstraet  syntax  for  Aq  is  as  follows: 


type 

A  ::=  • 

■■  \  W 

world  term  type 

W 

term 

M  ::=  • 

••  1  VF 

world  term 

W 

expression 

E  ::=  ■ 

••  1  / 

instruction 

I 

value 

V  ::=  ■ 

••  1  FF 

The  type  of  a  world  term  may  depend  on  the  world  where  it  resides.  For  example,  a  referenee  is  a  pointer 
to  a  heap  eell  and  its  type  depends  on  the  store  for  whieh  it  is  valid.  Therefore  typing  rules  for  world  terms 
may  have  to  analyze  worlds.  Sinee  world  terms  are  atomie  values,  typing  judgments  for  world  terms  do 
not  require  typing  eontexts.  In  eontrast,  typing  judgments  for  instruetions  require  typing  eontexts  beeause 
instruetions  may  inelude  terms  as  arguments: 

W  :W@  CO  Th  I  ^A@  CO 

Note  that  an  instruetion  does  not  neeessarily  have  a  world  term  type.  For  example,  an  instruetion  for  deref- 
ereneing  referenees  can  have  any  type  because  heap  cells  can  contain  values  of  any  type. 

If  an  instruction  I  whose  arguments  are  all  values  typechecks  at  a  world  co  under  an  empty  typing 
context,  we  regard  it  as  reducible  at  co',  moreover  we  require  that  an  instruction  reduction  I  @  co  i-^e  V  @  oo' 
be  type-preserving  so  that  V  has  the  same  type  as  / : 

Type-preservation/progress  requirement  on  instructions 

If  -V-s  I  ^  A@  CO  and  arguments  to  I  are  all  values,  then  there  exists  a  world  co'  satisfying 
I  @  CO  i-^e  V  @00'  and  •  1^  F  ■.  A  @  00'. 

We  allow  CO  =  uo' ,  which  means  that  a  world  effect  does  not  always  causes  a  change  to  a  world  (e.g.,  reading 
the  contents  of  a  store  is  still  a  world  effect). 

As  I  @  CO  i-^e  V  @  co'  means  that  instruction  I  computes  to  value  V  causing  a  transition  of  world  from 
CO  to  co',  it  implies  co  <  co'.  Now  the  accessibility  relation  <  is  fully  specified  by  instruction  reductions  under 
the  assumption  that  it  is  reflexive  and  transitive.  Note  that  without  additional  requirements  on  instructions, 
there  is  no  guarantee  that  the  monotonicity  of  <  is  maintained.  For  example,  an  instruction  for  deallocating 
an  existing  reference  I  violates  monotonicity  of  truth  if  I  no  longer  typechecks  after  it  is  deallocated,  and 
violates  persistence  of  computation  if  its  corresponding  heap  cell  is  discarded.  In  order  to  maintain  the 
monotonicity  of  <,  we  further  require  that  all  instruction  reductions  be  designed  in  such  a  way  that  types  of 
world  terms  and  instructions  are  unaffected  by  <: 

Monotonicity  requirement  on  instructions 

1)  If  CO  <  co' ,  then  W  :W  @  co  implies  W  :  >V  @  w'. 

2)  If  CO  <  co',  then  F  yl  @  cj  implies  F  yl  @  w',  where  for  each  argument  M  to  I, 

we  assume  that  F  1^  M  :  B  @  co  implies  T  M  :  B  @  co'. 
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M^t  M' 


MN^iM'N  {\x:A.M)  N  ^i[N/x]M  M  @  co  N  @  u 

M^tN 


'/3 


M^tN 


Ere' 


letcmp  X  <1  M  in  F  @  (j  i-^e  letcmp  x  <i  F"  in  F  @  a; 
E^I 


E 


Bind 


letcmp  X  <1  cmp  F  in  F  @  w  i-^g  {E/x)F  @  lo 

_ I  ©VJ^eV  ®U}' _ 

letcmp  X  <1  cmp  /  in  F  @  w  i-^e  letcmp  x  <i  cmp  F  in  F  @  w 


Esindfi 

7  EBindl 


Figure  2.5:  Operational  semantics  of  Ao  which  uses  expression  substitutions  for  expression  computations. 


The  first  clause  corresponds  to  monotonicity  of  truth,  and  the  second  clause  to  persistence  of  computa¬ 
tion.  Under  the  monotonicity  requirement,  instruction  reductions  never  affect  types  of  existing  terms  and 
expressions: 

Proposition  2.7  (Monotonicity  of  <  ). 

If  LV  <uj',  then 

T  hs  M  :  A  @  Lv  implies  T  hs  M  :  A@  uj',  and 
r  F  yl  @  a;  implies  T  F  ^  @ 

Proof.  By  simultaneous  induction  on  the  structure  of  M  and  E.  □ 

Unlike  other  expression  constructs,  instructions  are  not  explained  logically  and  no  expression  substi¬ 
tution  can  be  defined  on  them.  Intuitively  {I/x)E  cannot  be  reduced  into  another  expression  because  I 
itself  does  not  reveal  a  term  that  is  evaluated  at  the  end  of  its  computation.  Such  a  term  (which  is  indeed 
a  value)  becomes  known  only  after  an  instruction  reduction  I  @  lo  i-^e  V  @  ■  We  should  therefore  never 

attempt  to  directly  reduce  letcmp  x  <i  cmp  /  in  F  into  {I lx)E.  For  the  sake  of  convenience  and  uniform 
notation,  however,  we  abuse  the  notation  {I /x)E  with  the  following  definition,  which  effectively  prevents 
letcmp  X  <]  cmp  /  in  F  from  being  reduced  by  =^pexp- 

{I/x)E  =  letcmp  X  <1  cmp  /  in  F 

This  definition  of  {I/x)E  allows  ^/3exp  to  bo  applied  to  any  part  of  a  given  expression;  Proposition  2.5  also 
continues  to  hold. 

2.3.5  Operational  semantics 

A  term  reduction  by  ^/jterm  and  an  expression  reduction  by  ^/jexp  are  both  proof  reductions  and  may  be 
applied  to  any  part  of  a  given  term  or  expression  without  affecting  its  type.  An  operational  semantics  of 
Ao  defines  the  term  reduction  relation  and  the  expression  reduction  relation  by  specifying  a  strategy 
for  arranging  reductions  by  ^/jterm  and  ^/3exp-  Below  we  consider  two  different  styles  of  operational 
semantics  (both  of  which  use  the  same  syntax  for  reduction  relations).  For  each  instruction  I,  we  assume  an 
instruction  reduction  I  @  to  i-^e  V  @  ,  which  causes  a  transition  of  world  from  uj  to  cj';  if  I  has  arguments, 

we  first  reduce  them  into  values  by  applying  repeatedly. 

Figure  2.5  shows  an  operational  semantics  of  Ao  which  uses  expression  substitutions  (F/x)F  for  ex¬ 
pression  computations;  for  term  evaluations,  we  can  choose  any  reduction  strategy  (Figure  2.5  uses  a  call- 
by-name  discipline).  The  rule  Tp  is  a  shorthand  for  applying  to  (Ax:  A.  M)  N.  The  rules  Exerm 
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M' 

M  N  N 


^I3l 


N' 


(Ax  :A.M)  N  (Ax :  A.  M) 


7 


Ta 


M  ^tN 


Ere: 


{Xx:A.M)V  ^t[V/x]M  M  @  co  N  @  to 

M^tN 


letcmp  X  <  M  \n  F  @  cv  i-^e  letcmp  x  <i  in  F  @  w 
E  @  to  l-^e  E'  @  to' 


E 


Bind 


letcmp  X  <\  cmp  F  in  F  @  w  i-^e  letcmp  x  <i  cmp  F'  in  F  @  u)' 
letcmp  X  <1  cmp  F  in  F  @  w  i-^e  \y/x\F  @  uj 


/  Ej^indR 


Figure  2.6:  Operational  semantics  of  Ao  in  the  direct  style. 


and  Esind  use  a  term  reduction  M  N  to  reduce  a  term  into  a  value.  The  rule  EBindp  is  a  shorthand 
for  applying  ^/jexp  to  letcmp  x  <i  cmp  E  in  F;  in  the  case  of  F  =  M,  it  reduces  letcmp  x  <i  cmp  M  in  F 
into  {M/x)F  =  [M/x]F  without  further  reducing  M.  The  rule  Esindi  perform  an  instruction  reduction 
I  ©LO^^V  @uj'. 

Figure  2.6  shows  an  alternative  style  of  operational  semantics,  called  the  direct  style,  which  requires 
only  term  substitutions  \y /x\E  for  expression  computations;  for  term  evaluations,  we  can  choose  any  re¬ 
duction  strategy  (Figure  2.6  uses  a  call-by-value  discipline).  The  rules  ETerm  and  Esind  are  the  same  as  in 
Figure  2.5.  Given  letcmp  x  <i  cmp  F  in  F,  we  apply  the  rule  EsindR  repeatedly  until  F  is  reduced  into  a 
value  V ;  then  the  rule  EsindV  reduces  letcmp  x  <i  cmp  F  in  F  into  {V/x)F  =  [V/x]F.  Thus  a  variable  is 
always  replaced  by  a  value  (during  both  term  evaluations  and  expression  computations). 

The  direct  style  is  more  extensible  than  the  first  style  because  it  does  not  use  expression  substitutions. 
That  is,  the  introduction  of  a  new  expression  construct  requires  only  new  reduction  rules.  In  comparison, 
the  first  style  hinges  on  expression  substitutions,  and  requires  not  only  new  reduction  rules  but  also  an 
augmented  definition  of  expression  substitution  for  each  new  expression  construct.  If  expression  substitution 
cannot  be  defined  on  a  new  expression  consfrucf,  we  may  have  fo  furfher  specialize  exisfing  reducfion  rules. 
For  example,  fhe  rules  Esindfd  and  Esindi  can  be  fhoughf  of  as  derived  from  a  common  reducfion  rule  when 
insfrucfions  are  infroduced. 

The  type  safely  of  Ao  consisls  of  fwo  properties:  lype  preservation  and  progress.  The  proof  of  type 
preservalion  uses  Corollaries  2.4  and  2.6,  fhe  lype-preservalion/progress  requiremenl  on  insfrucfions,  and 
Proposition  2.7.  The  proof  of  progress  requires  a  canonical  forms  lemma.  In  eifher  slyle  of  fhe  operational 
semanfics,  all  proofs  proceed  in  fhe  same  way. 

Theorem  2.8  (Type  preservation). 

If  M  El  and  •  ty  M  :  A  @  ui,  then  •  ty  F"  ;  A  @  w. 

If  E  @  oj  i-^e  F  @  io'  and  ■  hs  E  A  @  to,  then  ■  hs  F  A  @  to'. 

Proof  By  inducfion  on  fhe  slrucfure  of  M  and  F.  □ 

Lemma  2.9  (Canonical  forms). 

IfV  is  a  value  of  type  A  D  B,  then  V  is  a  lambda  abstraction  \x:A.  M. 

IfV  is  a  value  of  type  OA,  then  V  is  a  computation  term  cmp  F. 


Proof  By  inspecfion  of  fhe  typing  rules. 


□ 
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Theorem  2.10  (Progress). 

If  -V-s  M  :  A  @  w,  then  either  M  is  a  value  or  there  exists  N  such  that  M  N- 

If  ■  E  ^  A  @  u),  then  either  E  is  a  value  or  there  exist  E  and  u)'  such  that  E  @  u)  i-^e  F  @ 

Proof  By  induction  on  the  structure  of  M  and  E.  □ 

Since  expressions  may  produce  world  effects,  they  cannot  be  converted  into  terms.  In  contrast,  terms 
can  always  be  lifted  to  expressions  by  the  typing  rule  Term.  Therefore  we  define  a  program  as  a  closed 
expression  E  that  typechecks  at  a  certain  initial  world  coinUmi,  i-^-,  ■  h  E  A  @  ujinuiai-  We  choose  tOinUiai 
according  to  the  world  structure  being  employed.  To  run  a  program  E,  we  compute  it  at  oj initial- 


2.4  Examples  of  world  effects 

In  order  to  implement  a  specific  notion  of  world  effect  in  Ao,  we  specify  a  world  structure  and  provide 
instructions  to  interact  with  worlds.  In  this  section,  we  discuss  three  specific  nofions  of  world  effecl. 


2.4.1  Probabilistic  computations 


In  order  fo  facililafe  fhe  coding  of  sampling  techniques  developed  in  simulation  fheory,  we  model  a  proba¬ 
bilistic  compufafion  as  a  compufafion  fhaf  refurns  a  value  afler  consuming  real  numbers  drawn  independenfly 
from  C/(0.0, 1.0],  rafher  fhan  a  single  such  real  number.  A  real  number  r  is  a  world  term  of  type  real.  A 
world,  fhe  source  of  probabilistic  choices,  is  represenfed  as  an  infinite  sequence  of  real  numbers  drawn 
independenfly  from  (7(0.0,  l.Oj.  We  use  an  insfrucfion  S  for  consuming  fhe  firsl  real  number  of  a  given 
world. 


world  ferm  fype 

W 

world  ferm 

w 

insfrucfion 

I 

world 

UJ 

real 

r 

S 

ri'i"2  ■  ■  -  Ti  -  ■  ■  where  r*  G  (0.0, 1.0] 


r  :  real  @  w 


Real 


r  hs  5  -G  rea  I  @  a; 


Sampling 


S  @  rir2r3  •  •  •  i-^e  @  r2r3  •  •  • 


Sampling 


If  is  easy  fo  show  fhaf  insfrucfion  S  satisfies  fhe  fype-preservafion/progress  requiremenf.  Since  a  world 
does  nol  affecl  types  of  world  terms  and  insfrucfions,  fhe  monofonicify  of  <  also  holds  frivially.  We  can  use 
any  world  as  an  initial  world.  As  we  will  see  in  Chapter  3,  Ao  wifh  fhe  above  consfrucfs  for  probabilistic 
compufafions  serves  as  fhe  core  of  FTP. 


2.4.2  Sequential  input/output 

We  model  sequenfial  input/oufpuf  wifh  a  compufafion  fhaf  consumes  an  infinite  inpuf  characfer  sfream  is 
and  oufpufs  fo  a  finife  oufpuf  characfer  sfream  os,  where  a  characfer  is  a  world  ferm  of  fype  char.  We  use 
fwo  insfrucfions:  read_c  for  reading  a  characfer  from  fhe  inpuf  sfream  and  write_c  M  for  writing  a  characfer 
fo  fhe  oufpuf  sfream. 


world  ferm  type 

W 

:=  char 

world  term 

W 

;=  c 

insfrucfion 

I  : 

:=  read_c  write_c  M 

world 

to 

:=  (is,  os) 

is 

:=  C1C2C3  •  •  • 

os  : 

;=  nil  \  c  os 
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Char 


c  :  char  @  a;  F  hs  read_c  char  @  u) 

r  h  M  :  char  @  to 


r  h  write_c  M  char  @  uj 


Write_c 


read_c  @  (C1C2C3  •  •  •  ,  os)  i-^e  ci  @  (C2C3  •  •  •  ,  os) 
M^tN 


Read_c 


Read-C 


write_c  M  @  Lo  i-^e  write_c  N  @  co 


write_c  c  @  {is,  os)  i-^e  c  @  {is,  c  ::  os) 


Write_c 
Write-c' 


It  is  easy  to  show  that  both  instructions  satisfy  the  type-preservation/progress  requirement.  As  in  prob- 
abilistic  computations,  a  world  does  not  affect  types  of  world  terms  and  instructions,  and  the  mono  tonicity 
of  <  holds  trivially.  We  use  an  empty  output  character  stream  nil  in  an  initial  world. 


2.4.3  Mutable  references 

Probabilistic  computations  and  sequential  input/output  are  easy  to  model  because  worlds  do  not  affect  types 
of  world  terms  and  instructions.  Mutable  references,  however,  require  world  terms  whose  type  depends  on 
worlds,  namely  references.  Consequently  worlds  should  be  designed  in  such  a  way  that  they  provide  enough 
information  on  a  given  reference  to  correctly  determine  its  type. 

We  use  ref  A  as  world  term  types  for  references.  A  world  is  represented  as  a  collection  of  pairs 
[/  1-^  F  :  A]  of  a  reference  I  and  a  closed  value  V  annotated  with  its  type  A.  It  may  be  thought  of  as  a 
well-typed  store:  if[l  ^  V  :  A]  G  tu,  then  V  has  type  A  at  world  u/  {i.e.,  ■  hs  V  ;  A  @  w)  and  references  in  it 
are  all  distinct.  We  use  three  instructions:  new  M  :  A  for  initializing  a  fresh  reference,  read  M  for  reading 
the  contents  of  a  world,  and  write  M  M  for  updating  a  world.  Reading  the  contents  of  a  world  is  a  world 
effect,  even  though  it  does  not  cause  a  change  to  the  world. 


world  term  type 

W 

::=  ref  A 

world  term 

w 

::=  1 

instruction 

I 

::=  new  M  :  A\ 

read  M  write  M  M 

world 

UJ 

::=  ■  \  uj,[l  ^  V 

:A] 

Figure  2.7  shows  new  typing  rules  and  reduction  rules: 

To  prove  the  type-preservation/progress  requirement  on  instructions,  we  first  show  that  well-typed  in¬ 
structions  never  generate  corrupt  worlds  (Corollaries  2.12  and  2.14).  In  Lemma  2.11,  we  do  not  postulate 
that  ui,[l  ^  V  :  A]  is  a  world  {i.e.,  it  possesses  the  structure  of  a  store,  but  may  not  be  well-typed). 

Lemma  2.11.  If  uj  is  a  world  and  •  hi  C  :  A  @  uj,  then 

r  hi  M  :  B  @  UJ  implies  T  \-s  M  :  B  @  uj,[l  V  :  A],  and 

T  hs  E  B  @  UJ  implies  ThsE-^B@uj,[l>-^V  :  A],  where  I  is  afresh  reference. 

Proof.  By  simultaneous  induction  on  the  structure  of  M  and  E.  An  interesting  case  is  when  M  =  I'  1. 

If  M  =  I',  then  T  hs  M  :  B  @  uj  implies  B  =  ref  B'  and  [V  ^  V'  :  B']  G  a;  by  the  rule  Ref.  Since 
[('  :  B']£uj,[I^V  :  A],  we  have  T  hs  M  :  B  @  uj,[l  ^  V  :  A].  □ 

Corollary  2.12.  If  •  hi  1/  :  A  @  w  where  uj  is  a  world,  then  uj,[l  :  A]  is  also  a  world  for  any  fresh 
reference  1. 

Proof.  For  each  [V  ^  V'  :  A']  G  uj,  we  have  •  hsV  :  A'  @  uj  because  tu  is  a  world.  By  Lemma  2.11,  we 
have  -hsV'  :  A'  @  UJ, [I  ^  V  :  A].  From  •  Fs  L  :  A@  uj  and  Lemma  2.11,  •  hi  L  :  A  @  uj,[l  V  :  A]  also 
follows.  □ 
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Ref 


r  hs  M  :  A 


CO 


New 


r  h  read  M  -\r  A  @  co 


Read 


r  new  M  :  A -\r  ref  A  @  CO 

Tf-sM  :  ref  A  @  CO  T^N  :  A  @co 
r  hi  write  M  N  A  @  CO 

M^tN 


Write 


new  M  :  A  @  CO  i-^e  new  N  :  A  @  co 
fresh  I  such  that  [I  V'  :  A']  0  uo 


New 

New' 


new  V  :  A@  CO  i-^e  I  @  co  ,[l  V  :  A\ 

M  I — N  \l  I — ^  'V'  I  A\  G  to 


Read 


read  M  @  co  read  N  @  co  read  I  @  co  @  w 

M  M' 


Read' 


write  M  N  @  CO  i-^e  write  M'  N  @  co 
Nc^t  N' 


Write 


write  I  N  @  CO  i-^e  write  I  N'  @  co 

[I  ^  V  :  A]  ^  CO 


Write' 


write  IV  @  CO  i-^e  ^  @  co  —  [I  ^  V  :  A],\l  V  :  A] 


Write" 


Figure  2.7:  Typing  rules  and  reduction  rules  for  mutable  references. 


In  Lemma  2.13,  we  do  not  postulate  that  co  —  [I  V'  :  A\,[l  V  :  A]  is  a  world. 

Lemma  2.13. 

If  -f-sV  :  ^4  @  a;  and  [I  ^  V'  :  A\  ^  co  where  co  is  a  world,  then 

ThM  :Bmco  implies  T  h  M  :  B  m  co  -  [I  ^  V  :  A],[l  V  ■.  A]  and 

Tf-sE^Bmco  implies  T  h  E  ^  B  m  oo  -  [I  V  :  A],[l  V  :  A]. 

Proof.  By  simultaneous  induction  on  the  structure  of  M  and  E.  An  interesting  case  is  when  M  =  1.  □ 

Corollary  2.14. 

If-flV  :  A  @  w  and  [I  ^  V  :  A]  G  co  where  co  is  a  world,  then 
CO  —  [I  ^  V'  ■.  A\,\l  V  ■.  A]  is  also  a  world. 

Proof.  Similarly  to  the  proof  of  Corollary  2.12.  □ 

Proposition  2.15  (Type-preservation/progress  requirement  on  instructions).  If  -f-s  I  ^  A@  co  and  ar¬ 
guments  to  I  are  all  values,  then  there  exists  a  world  co'  satisfying  I  @  co  i-^e  V  @  co'  and  •  i^  C  :  A@  co'. 

Proof.  By  case  analysis  of  I.  We  use  Corollaries  2.12  and  2.14.  □ 

For  the  monotonicity  requirement  on  instructions,  we  directly  prove  Proposition  2.7  exploiting  Lem¬ 
mas  2.11  and  2.13. 

Proof  of  Proposition  2.7.  Since  the  accessibility  relation  <  is  specified  by  instruction  reductions,  co  <  co' 
implies  that 

CO  =  COl  <■■■<  COi  <■■■<  COn  =  co' , 

where  Wj+i  is  equal  to  either  coi,[l  V  ■.  A]  or  coi  —  [I  V'  \  A\,  [I  V  :  A]  for  1  <  i  <  n.  We  proceed 
by  induction  on  n.  □ 
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In  order  to  maintain  the  monotonieity  of  <,  all  referenees  in  a  world  must  be  persistent,  sinee  onee  a 
referenee  is  dealloeated,  its  type  ean  no  longer  be  determined.  This  means  that  an  explieit  instruetion  for 
dealloeating  referenees  (e.g.,  delete  M)  is  not  allowed  in  Ao-  In  the  present  framework  of  Ao,  even  garbage 
eolleetions  are  not  allowed  beeause  they  destroy  the  monotonieity  of  <:  a  garbage  eolleetion  transition 
from  CO  to  to'  must  ensure  that  I  :  ref  A  @  to  implies  I  :  ref  A  @  to'  for  every  possible  referenee  I,  ineluding 
those  referenees  not  found  in  a  given  program,  whieh  are  preeisely  what  it  dealloeates.  (In  praetiee,  garbage 
eolleetions  do  not  interfere  with  evaluations  and  eomputations,  and  are  safe  to  implement.)  We  use  an  empty 
store  as  an  initial  world. 

2.4.4  Supporting  multiple  notions  of  world  effect 

Sinee  a  world  strueture  realizes  a  speeifie  notion  of  world  effeet  and  instruetions  provide  an  interfaee  to 
worlds,  we  ean  support  multiple  notions  of  world  effeet  by  eombining  individual  world  struetures  and  letting 
eaeh  instruetion  internet  with  its  relevant  part  of  worlds.  For  example,  we  ean  use  all  the  above  instruetions 
if  a  world  eonsists  of  three  sub-worlds:  an  infinite  sequenee  of  real  numbers,  input/output  streams,  and  a 
well-typed  store.  This  is  how  Ao  eombines  world  effeets  at  the  language  design  level. 

We  may  think  of  Ao  as  providing  a  built-in  implementation  of  a  state  monad  whose  states  are  worlds. 
Then  the  ease  of  eombining  world  effeets  in  Ao  refleets  the  faet  that  state  monads  eombine  well  with  eaeh 
other  (by  eombining  individual  states). 

2.5  Fixed  point  constructs 

In  this  seetion,  we  investigate  an  extension  of  Ao  with  fixed  point  eonstruets.  We  first  eonsider  those  based 
upon  the  unfolding  semanties,  in  whieh  a  fixed  point  eonstruet  reduees  by  unrolling  itself.  Next  we  eonsider 
those  based  upon  the  baekpatehing  semanties,  as  used  in  Seheme  [3].  For  expressions,  we  assume  the 
operational  semanties  in  the  direet  style  in  Figure  2.6. 

For  a  uniform  treatment  of  types,  we  ehoose  to  allow  fixed  point  eonstruets  for  all  types.  An  alternative 
approaeh  would  be  to  eonfine  fixed  point  eonstruets  only  to  lambda  abstraetions  (as  in  ML),  but  it  would  be 
inadequate  for  our  purpose  beeause  reeursive  eomputations  require  fixed  point  eonstruets  for  eomputation 
terms  (of  type  OA)  anyway. 

2.5.1  Unfolding  semantics 

We  use  fix  x:  A.  M  as  a  term  fixed  point  construct  for  reeursive  evaluations.  Its  typing  rule  and  reduetion 
rule  are  as  usual: 

term  M  ::=  •••  |fixx:A.  M 
r,x:AI-sM:A@u;. 

r  hs  fix  x:  A.  M  :  A  @  w  fix  x :  A.  M  [fix  Af/x]M 

In  the  presenee  of  term  fixed  point  eonstruets,  any  truth  judgment  A  true  holds  vaeuously,  sinee  fix  x :  A.  x 
typeeheeks  for  every  type  A  and  represents  a  proof  of  A  true.  Now  a  term  M  of  type  A  does  not  always 
represent  a  eonstruetive  proof  of  A  true',  rather  it  may  eontain  nonsensieal  proofs  sueh  as  fix  x'.B.x.  The 
definition  of  a  eomputability  judgment  A  comp,  however,  remains  the  same  beeause  it  is  defined  relative  to 
a  truth  judgment  A  true. 

In  eonjunetion  with  eomputation  terms  cmp  E,  term  fixed  point  eonstruets  enable  us  to  eneode  reeursive 
eomputations:  we  first  build  a  term  fixed  point  eonstruet  M  of  type  OA  and  then  eonvert  it  into  an  expression 
letcmp  X  <  M  \n  X,  which  denotes  a  recursive  computation.  Generalizing  this  idea,  we  define  syntactic  sugar 
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for  recursive  computations.  We  introduce  an  expression  variable  x  and  an  expression  fixed  point  construct 
efix  E',  a  new  form  of  binding  x  yl  for  expression  variables  is  used  in  typing  contexts: 

expression  E  ::=  ■■■  |  x  |  efix  x-^A.  i? 

typing  context  T  ::=  •  •  •  |  F,  x  A 

New  typing  rules  and  reduction  rule  are  as  follows: 

T,yL^AhE^A@uj 

r,X^y4hsX^  rhsefixx^y4..E^y4@W 

_ 

efix  x-^^.  E  @  oj  i-^e  [efix  x-^yl.  E /yi\E  @  oj 

In  the  rule  Efix,  [efix  x-^A.  E /■x\E  denotes  a  capture-avoiding  substitution  of  efix  x-^A.  E  for  expression 
variable  x.  Thus  efix  A.  E  behaves  like  term  fixed  point  constructs  except  that  it  unrolls  itself  by 
substituting  an  expression  for  an  expression  variable,  instead  of  a  term  for  an  ordinary  variable. 

To  simulate  expression  fixed  point  constructs,  we  define  a  function  (•)*  which  translates  (efix  x-^^.  E)* 
into: 

letcmp  Ur  <  fix  Xp-.OA.  cmp  [letcmp  <  Xp  in  y„/x]T^*  in  pr 

That  is,  we  introduce  a  variable  Xp  to  encapsulate  efix  x4-y4.  E  and  expand  x  to  a  bind  expression  letcm p  <  Xp  \n  y^. 
The  translation  of  other  terms  and  expressions  is  structural;  for  the  sake  of  simplicity,  we  do  not  consider 
world  terms  and  instructions: 


X 

{\x:A.M) 
(Ml  M2) 
(cmp  E) 
(fix  x\A.  M) 
(letcmp  X  <1  M  in  E) 

X' 


x 

Xx:A.M* 

Ml*  M2* 
cmp  E* 
fix  x:A.  M* 
letcmp  X  <  M*  in  E 

X 


Proposition  2.17  shows  that  when  translated  via  the  function  (•)*,  the  typing  rules  Evar  and  Efix  are 
sound  with  respect  to  the  original  type  system  (without  the  rules  Evar  and  Efix). 

Lemma  2.16. 

IfThF^A@ujandT,yi^AhM:B@ui,  then  T  li  [F/x]M  :  B  @  ui. 

IfT  \-s  F  -\r  A  @  ui  and  T,x  ^  A\-s  E  -\r  B  @  ui,  then  T  [E/x\E  -\r  B  @  co. 

Proof.  By  simultaneous  induction  on  the  structure  of  M  and  E.  □ 

Proposition  2.17. 

IfVhsM  :A@  uj,  then  :A@  to. 

IfVhE^A©  OJ,  then  T  hs  E*  ^  A  @  oj. 

Proof.  By  simultaneous  induction  on  the  structure  of  the  derivation  of  F  1^  M  :  A  @  w  and  F  1^  FJ  ^  @  w. 
An  interesting  case  is  when  E  =  efix  x-^A.  F. 

Case  E  =  efix  x-^  A.  E: 

F,x^AhsF^A@(j 
F,  X  ^  A  hs  F*  ^  A  @  w 


by  Efix 

by  induction  hypothesis 
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r,  Xp  :  OA,  X  A  hs  F*  yl  @  a;  by  weakening 

r,  Xp  :  OA  hs  letcmp  <  Xp  in  A  @  (j  (typing  derivation) 

r,  Xp  :  OA  hs  [letcmp  <  Xp  in  y^/x]F*  A@  uj  by  Lemma  2.16 

r  hi  letcmp  yr  <  fix  Xp;Oyl.  cmp  [letcmp  y^  <  Xp  in  y^/x]F*  in  ^  @ 

(typing  derivation) 

r  hi  (efix  x-^^.  F)*  ^4  @  a;  by  the  definition  of  (•)* 

□ 

Sinee  M*  and  E*  do  not  eontain  expression  fixed  poinf  eonsfruefs,  fhe  rule  Efix  is  nol  used  in  L  hi  M*  ;  ^  @  w 
and  r  \-s  E*  A  @  to.  Neifher  is  fhe  rule  Evar  used  unless  M  or  E  eonfains  free  expression  variables. 
Therefore,  given  a  term  or  expression  wifh  no  free  expression  variable,  fhe  funelion  (•)*  refurns  anofher 
term  or  expression  of  fhe  same  fype  whieh  does  nol  need  fhe  rules  Evar  and  Efix. 

Proposilions  2.22  and  2.23  show  lhal  fhe  reduelion  rule  Efix  is  sound  and  eomplele  wifh  respeel  lo  fhe 
operational  semanlies  (in  fhe  direel  style)  in  Seelion  2.3.5.  We  use  fhe  fael  lhal  Ihe  eompulalion  of  F*  does 
nol  require  Ihe  rule  Efix. 

Proposition  2.18. 

For  any  term  N,  we  have  {[N/x\M)*  =  [N* /x\M*  and  {[N/x\EY  =  [N* /x\E*. 

For  any  expression  F,  we  have  ([F/x]M)*  =  [F*/x]M*  and  ([F/x]F)*  =  [F*/x]F*. 

Proof.  By  simullaneous  induelion  on  Ihe  slruelure  of  M  and  E.  □ 

Lemma  2.19.  If  M  N,  then  M* 

Proof.  By  induelion  on  Ihe  slruelure  of  Ihe  derivation  of  M  hV.  □ 

Lemma  2.20. 

If  M*  then  there  exists  N  such  that  N'  =  N*  and  M  N. 

Proof.  By  induelion  on  Ihe  slruelure  of  Ihe  derivation  of  M*  N' .  □ 

We  inlroduee  an  equivalenee  relation  =e  on  expressions  lo  slate  lhal  Iwo  expressions  eompule  lo  Ihe 
same  value. 

Definition  2.21. 

E  =e  F  if  and  only  if  E  @  cv  V  @  cv'  implies  E  @  to  V  @  cv',  and  vice  versa. 

The  following  equivalenees  are  used  in  proofs  below: 
letcmp  X  <1  cmp  F  in  X  =e  F 

letcmp  X  <]  cmp  F  in  F  =e  letcmp  x  <i  cmp  F' in  F  where  E  =q  E' 

(efix  x-^^.  F)*  =e  [(efix  x-^A.  F)*/x]F* 

The  third  equivalenee  follows  from  an  expression  reduelion 

(efix  x-^^.  F)*  @  LO  i-^e  letcmp  yr  o  cmp  [(efix  x-^^.  F)*/x]F*  in  yr  @  u). 

Proposition  2.22. 

If  E  @  to  F  @  Lo'  with  the  rule  Efix,  then  E*  @  lo  i-^e  F'  @  uo'  and  E'  =e  F*. 
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Proof.  By  induction  on  the  structure  of  the  derivation  of  E'  @  tu  i-^e  F  @  ■  We  consider  the  case  E  = 

letcmp  X  <M  \n  Eq  where  M  /  cmp  E' . 

If  letcmp  X  <iM  \n  Eq@  oj  i-^e  letcmp  x  <i  in  Eq  @  tu  by  the  rule  Esind^  then  M  El. 

By  Lemma  2.19,  M*  Ef*. 

Since  (letcmp  x  <]  M  in  Eq)*  =  letcmp  x  <]  M*  in  Eq*  and  (letcmp  x  <]  At  in  Eq)*  =  letcmp  x  <i  N*  in  Eq*, 
we  have  (letcmp  x  <]  M  in  Eq)*  @  uo  i-^e  (letcmp  x  <i  At  in  Eq)*  @  u. 

Then  we  let  E'  =  (letcmp  x  <]  At  in  Eq)*.  □ 


Proposition  2.23. 

If  E*  @  u)  i-^e  F'  @  Lo',  then  there  exists  F  such  that  F'  =e  E*  and  E  @  to  i-^e  F  @  co'. 


Proof  By  induction  on  the  structure  of  the  derivation  of  E*  @  ut  i-^e  F'  @  uj' .  An  interesting  case  is  when 
the  rule  Esind  is  applied  last  in  a  given  derivation. 

If  E  =  letcmp  X  <1  M  in  Eq,  then  E*  =  letcmp  x  <i  M*  in  Eq*. 

By  Lemma  2.20,  there  exists  N  such  that  M  El  and  M*  El*. 

Hence  we  have  E  @  oo  i-^e  letcmp  x  <i  At  in  Eq  @  tu'  and  E*  @  oj  i-^e  letcmp  x  <i  N*  in  Eq*  @  oj'  (where 

UJ  =  uj'). 

Then  we  let  E  =  letcmp  x  <i  A^  in  Eq. 

If  E  =  efix  x-^A.  Eq,  then  F'  =e  ([efix  x-^A.  Eo/x]Eo)*  (and  oj  =  u)') 

because  (efix  x-^A.  Eq)*  =e  [(efix  x-^A.  Eo)*/x]Eo*  =  ([efix  x-^A.  Eq/xJEq)*. 

Then  we  let  E  =  [efix  x-^A.  Eo/x]Eo.  □ 


As  seen  in  the  definition  of  expression  fixed  poinf  consfrucfs,  ferm  fixed  poinf  consfrucfs  can  leak  info 
expressions  fo  give  rise  fo  recursive  compufafions.  Nofe  fhaf  non-ferminafing  compufafions  in  Aq  are  nof 
necessarily  due  fo  (term  or  expression)  fixed  poinf  consfrucfs,  since  mufable  references  can  also  be  ex- 
ploifed  fo  encode  recursive  compufafions.  For  example,  fhe  following  expression  inifiales  a  non-ferminafing 
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computation  in  which  reference  x  stores  a  computation  term  which  dereferences  itself: 


letcmp  X  <1  cmp  new  cmp  0  in 

letcmp  y  o  cmp  write  x  cmp  (  letcmp  y  o  cmp  read  x  in 

letcmp  z<ym 

z  ) 

in 

letcmp  z<ym 
z 


letcmp  y  <i  cmp  write  I  cmp  (  letcmp  y  <  cmp  read  I  in 

letcmp  z  <y  \n 

z  ) 

in 

letcmp  z  <y  \n 

z 


[I  1-^  cmp  0  :  O  int] 


letcmp  z  <1  cmp  (  letcmp  y  <  cmp  read  I  in 
letcmp  z<\y\n 

z  ) 

in 

z 


[I  ^  cmp  (  letcmp  y  <  cmp  read  I  in 
letcmp  z  <y  \n 

z  )  :  O  int] 


letcmp  z  <1 

cmp  (  letcmp  z  <i  cmp  (  letcmp  y  o  cmp  read  I  in 

letcmp  z<\y\n 

z  ) 

in 

^  ) 

in 

z 


[I  ^  cmp  (  letcmp  y  <  cmp  read  I  in 
letcmp  z  <y  \n 

z  )  :  O  int] 


2.5.2  Backpatching  semantics 

Unlike  the  unfolding  semantics,  the  backpatching  semantics  evaluates  or  computes  a  fixed  point  construct 
by  first  finishing  fhe  reduction  of  ifs  body  and  fhen  “tying  a  recursive  knot”,  or  “backpatching”  fhe  resulf. 
For  ferm  evaluafions,  fhe  fwo  semanfics  are  equivalenf  excepf  fhaf  when  fhe  unfolding  semantics  gives  rise 
fo  an  infinife  loop,  fhe  backpafching  semanfics  generafes  an  error. 

We  invesfigafe  a  fixed  poinf  consfrucf  vfix  E  for  expressions  fhaf  is  based  upon  fhe  backpafching 
semanfics.  Unlike  efix  x-^^.  E  which  compufes  a  fixed  poinf  over  bofh  values  and  world  effecfs  and  fhus 
X  is  inferprefed  as  an  expression,  if  compufes  a  fixed  poinf  only  over  values  and  z  in  if  is  a  ferm.^  For 
fhis  reason,  fhe  compufafion  is  usually  referred  fo  as  value  recursion  [18].  Similar  consfrucfs  are  found  in 
Erkdk  and  Launchbury  [18]  (fixed  poinf  consfrucf  mfix  in  Haskell)  and  Launchbury  and  Peyfon  Jones  [37] 
(recursive  sfafe  fransformer  fixST  in  Haskell). 


^In  this  regard,  the  two  fixed  point  constructs  for  expressions  cannot  he  compare  directly. 
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Syntax  and  type  system 

We  introduce  a  recursion  variable  z  (with  an  underscore)  as  a  term  and  a  value  recursion  construct  vfix  z:A.E 
as  an  expression: 

term  M  ::=  ■  ■  ■  \  z 

expression  E  :;=  •••  \A\y.z:A.E 

A  substitution  for  z  is  defined  in  a  standard  way.  To  simplify  the  presentation  of  the  type  preservation 
theorem  (Theorem  2.25),  we  separate  recursion  variables  from  ordinary  variables  in  the  type  system  by 
introducing  a  value  recursion  context  S  for  recursion  variables: 

value  recursion  context  S  ::=  •  |  :  A 

A  typing  judgment  now  includes  a  value  recursion  context  to  record  the  type  of  each  recursion  variable: 

term  typing  judgment  T;  S  M  :  A  @  w 

expression  typing  judgment  r;SI^£'-^A@a; 

Typing  rules  for  judgments  T  hg  M  :  A  @  w  and  T  hs  i?  A  @  w  induce  those  for  judgments  T;  S  hs  M  :  A  @  w 
and  r;SI^£'-^A@tuina  straightforward  way  (by  adding  S  to  every  judgment).  We  also  need  additional 
rules  for  recursion  variables  and  value  recursion  constructs: 

T;T.,z:  Ahs  E  ^  A@  Lo 

r;S,z:  Ahsz:  A@w  T;  S  vfix  z:  A.  .E  ^  A  @  w 

The  monofonicify  of  fhe  accessibilify  relafion  <  (in  Proposition  2.7)  is  now  sfafed  wifh  new  fyping 
judgmenfs. 

Proposition  2.24. 

If  uj  <uj',  then 

T;  S  1^  M  :  A  @  tu  implies  T;  S  M  :  A  @  oj',  and 
r;SI^£'-^A@tu  implies  T;  S  1^  77  A  @ 

Operational  semantics 

Conceptually  we  compute  vfix  z:  A.  E  as  follows:  first  we  bind  z  to  a  black  hole  so  that  any  premature 
attempt  to  read  it  results  in  a  value  recursion  error,  next  we  compute  E  to  obtain  a  value  V ;  finally  we 
“backpatch”  every  occurrence  of  z  in  C  with  V  itself  and  return  the  backpatched  value  as  the  result. 

One  approach  to  backpatching  z  with  V  is  by  replacing  z  by  a  fixed  point  construct  fix  z:  A.  V  (as  in 
[47]).  A  problem  with  this  approach  is  that  z  may  appear  at  the  resultant  world  after  computing  E.  That  is,  if 
77  at  a  world  uj  computes  to  V  at  another  world  uj',  z  may  be  used  by  uo'.  Then  we  would  need  substitutions 
on  worlds  as  well  (e.g.,  [fix  z:  A.  V j^uj'),  which  should  be  defined  for  each  kind  of  world  effect  and  thus 
we  want  to  avoid;  besides  the  type  preservation  property  becomes  difficult  to  prove. 

To  eliminate  the  need  for  substitutions  on  worlds,  we  maintain  a  recursion  store  a.  It  associates  each 
recursion  variable  with  a  value  V : 


recursion  store  a  ::=  •\o',z  =  V 

Now  we  reformulate  the  operational  semantics  with  two  reduction  judgments: 

•  A  term  reduction  M .  a  N  means  that  M  with  recursion  store  a  reduces  to  N. 
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•  An  expression  reduction  E  @  lo  ,  a  i-^e  F  @  uj' ,  a'  means  that  E  at  world  uj  with  recursion  store  a 
reduces  to  F  at  world  uj'  with  recursion  store  a' . 

A  term  reduction  requires  (but  does  not  update)  a  recursion  store  because  it  may  read  recursion  variables.  An 
expression  reduction  may  update  both  a  world  (by  reducing  instructions)  and  a  recursion  store  (by  reducing 
value  recursion  constructs).  Reduction  rules  for  judgments  M  ^  and  E  i-^e  F  @  lo'  induce  those 
for  judgments  M.  cr  N  and  E  @  to ,  a  i-^e  F  @  to' ,  a'  in  a  straightforward  way  (by  adding  a  to  every 
judgment). 

Instead  of  directly  modeling  black  holes  with  certain  special  values,  we  indirectly  model  black  holes  by 
reducing  vfix  z:  A.  E  to  an  intermediate  value  recursion  construct  vfix,  z:A.E.  That  is,  the  presence  of 
vfix,  z:A.E  means  that  z  is  assumed  to  be  bound  to  a  black  hole  and  that  E  is  currently  being  reduced;  if 
a  term  in  E  attempts  to  read  z,  it  results  in  a  value  recursion  error  and  the  whole  reduction  gets  stuck.  The 
typing  rule  for  vfix,  z:  A.  E  is  the  same  as  for  vfix  z:  A.  E: 

expression  E  ::=  •••  |vfix,  z:A.i? 


T-,E,z:  Ahs  E  ^  A@  UJ 
r;Shs  vfix.  z:A..E^  A 

The  rules  for  reducing  recursion  variables  and  value  recursion  constructs  are  as  follows: 


Z  =  y  G  cr 
Z  _  a  V 


Vvar 


vfix  z:  A.  E  @  u!  .a  i-^e  vfix,  z:  A.  E  @  uj  .  a 
E  @  UJ  .a  F  @  uj'  a' 


vfix,  z:A.E  @  uj  .a  i-^e  vfix,  z:A.F  @  uj  .a' 
z  =  V'  ^  a 


/  /  ^ fibred 


vfix,  z:A.V  @  UJ  .a  V  @  uj  .  a,  ^  =  V 


^fi^bpatch 


These  rules  ensure  that  any  premature  attempt  to  read  a  recursion  variable  bound  to  a  black  hole  results  in  a 
value  recursion  error  and  the  whole  reduction  gets  stuck.  The  rule  Vvar  implies  that  z  is  not  a  value  in  itself. 
The  rule  Vfixinit  initiates  the  computation  of  vfix  z:  A.  E  by  reducing  it  to  vfix,  z:  A.  E\  the  rule  Vfixred 
reduces  the  body  E  of  vfix,  z:A.E\  the  rule  Vfixbpatch  backpatches  z  with  V.  Note  that  a-conversion  is 
freely  applicable  even  to  vfix,  z:A.E. 

The  reduction  rule  Vfixbpatch  assumes  dynamic  renaming  of  recursion  variables  so  that  all  recursion 
variables  in  a  recursion  store  remain  distinct.  As  an  example,  consider  the  following  expression: 


letcmp  xi  0  cmp  vfix  z:  A.  Ei  in  letcmp  X2  <i  cmp  vfix  z:  A.  E2  '\n  F 


Although  we  do  not  need  to  rename  either  instance  of  z  during  typechecking,  we  have  to  rename  the  second 
instance  after  computing  vfix  z:  A.  E2  because  the  recursion  store  already  contains  a  recursion  variable  of 
the  same  name. 

Since  the  result  of  an  evaluation  or  a  computation  may  contain  recursion  variables,  we  need  to  incorpo¬ 
rate  recursion  stores  or  their  abstractions  in  stating  the  type  preservation  property.  We  use  value  recursion 
contexts  for  this  purpose  as  they  are  essentially  the  result  of  typing  recursion  stores.  Formally  we  write 
^  cr  :  S  @  a;  if  there  exists  a  one-to-one  correspondence  between  z  =  V  €  a  and  z  :  A  G  S  such  that 
•;  S  hs  1/  :  A  @  UJ  holds.  Now  type  preservation  property  is  stated  as  follows: 
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Theorem  2.25  (Type  preservation).  Suppose  |=  u  :  S  @  w. 

If  M ,  a  (ind  •;  S  M  ;  A  @  w,  then  ■;T,\-s  N  :  A  @  cv. 

If  E  @  (jj  ,  a  i-^e  F  @  u)' ,  a'  and  ■-,'E  hs  E  A  @  co,  then  there  exists  S'  such  that  •;  S'  h;  F  yl  @  w' 
and  1=  cj'  ;  S'  @  uj'. 


Proof  By  induction  on  the  structure  of  the  derivation  of  M.  a  N  and  E  @  to ,  a  i-^e  F  @  to' .  a'.  In¬ 
teresting  cases  are  when  one  of  the  rules  Vvar,  Vfixmit,  Vfixred,  and  Vfixbpatch  is  applied  last  in  a  given 
derivation.  We  consider  two  representative  cases  below. 


z  =  V  e  a 

Case  - 77  Vvar  : 

z  ,  a  y 

•;  S  hi  z  :  A  @  w  implies  z:  A  ^  Ehy  the  rule  Vvar. 
From  \=  (T  :  E  @  to,  z  =  V  G  rr,  and  z  :  A  G  S, 

we  have  •;  S  hi  1/  :  A  @  tu. 

Z  =  V'  f:0 


Case 


V@uj.a,z  =  V 


VfiXbpatch 


vfix,  z:A.V  @  CO  .a  I 
Since  ^  rr  :  S  @  tu, 

for  any  ^  =  )/'  G  rr,  we  have  • ;  S  hi  V'  -G  ^'  @  w  and  ^  :  g1'  G  S  for  some  type  A' . 

We  let  S'  =  S,  z  :  g1. 

Then,  for  any  ^  =  )/'  G  cr,  we  have  •;  S'  hi  V'  -G  ^4'  (Q)  te  and  A’  for  some  type  A'. 
The  rule  Vfix,  implies  ^  is  vfix,  ^:A.V  -G  g1  @  tu  and  •;T,,z:  AhsV  A  @  to. 

Then  •;  S'  hi  V  -G  ^  @  tu  and  z  :  ^  G  S'. 

Therefore  1=  a,  z  =  V  :  S'  @  w. 


□ 


Since  the  type  system  does  not  detect  value  recursion  errors,  the  computation  of  a  well-typed  expression 
may  end  up  with  a  value  recursion  error.  To  catch  value  recursion  errors  statically,  we  can  adopt  advanced 
type  systems  for  value  recursion  in  [9,  16]. 


Simulating  value  recursion  constructs 

Section  2.5.1  has  shown  that  efix  x-Gvl.S^  can  be  simulated  with  fixx:Gl.  M.  Can  we  also  simulate 
vfix  z:A.  E  with  fixxivl.M?  In  Haskell,  a  value  recursion  construct  mfix  for  a  specific  monad  can  be 
defined  in  ferms  of  fhe  ordinary  fixed  poinf  consfrucf  fix.  For  example,  Moggi  and  Sabry  [47]  show  fhaf  for 
a  sfafe  monad  M  ^  =  5  ^  (g1  x  S')  where  M  is  a  fype  consfrucfor  and  S  is  fhe  fype  of  sfafes,  mfix  can  be 
defined  as  follows: 

mfixxi^.M  =  As :  S'.  fixpiGl  X  S'.  (Ax  :g1.  M)  (fst  p)  s 

Here  we  use  a  producf  fype  A  x  S  and  a  projecfion  ferm  fst  p;  bofh  M  and  mfix  x  :  A.  M  have  fype 
M  y4  =  S'  ^  (^  X  5).  Since  fhe  fype  consfrucfor  O  in  Aq  essentially  forms  a  sfafe  monad,  if  may  appear 
fhaf  we  can  define  vfix  z :  vl.  E  in  ferms  of  fix  x :  A.  M.  Unlike  fhe  sfafe  monad  M  A,  however,  we  cannof 
access  sfafes  (i.e.,  worlds)  as  terms.  Therefore  we  cannot  exploit  the  above  idea  to  simulate  vfix  z:  A.  E  with 
fix  x:yl.  M. 

Another  idea  to  simulate  vfix  z:  A.  E  is  to  use  instructions  for  mutable  references:  to  compute  vfix  z:  A.  E, 
we  initialize  a  fresh  reference  for  z;  to  backpatch  z,  we  update  the  store.  In  this  case,  z  can  no  longer  be  a 
term  because  its  evaluation  requires  an  access  to  the  store.  In  other  words,  z  should  now  be  defined  as  an 
expression. 
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term 

M 

::=  •  •  •  contt  ^  calicct  x.  M  throwt  M 

M 

value 

V 

::=  ■  ■  ■  \  contt  k 

evaluation  context 

K 

::=  []  K  M  (AxiA.  M)  K  throwt  K  M 

throwt  (contt  n)  k 

Figure  2.8:  Syntax  for  continuations  for  terms. 


2.6  Continuations 

So  far,  we  have  restrieted  ourselves  to  world  effeets,  i.e.,  transitions  between  worlds.  Ao  eonfines  world 
effeets  to  expressions  so  that  terms  are  free  of  world  effeets.  When  we  extend  Aq  with  eontrol  effeets, 
however,  it  is  not  immediately  elear  whieh  syntaetie  eategory  should  be  permitted  to  produee  eontrol  effeets. 
On  one  hand,  we  eould  ehoose  to  eonfine  eontrol  effeets  to  expressions  so  that  terms  remain  free  of  any  kind 
of  effeet.  Then  the  distinetion  between  effeet-free  evaluations  and  effeetful  eomputations  is  drawn  in  a 
eonventional  sense.  On  the  other  hand,  in  order  to  develop  Ao  into  a  praetieal  programming  language,  it 
is  desirable  to  allow  eontrol  effeets  in  terms.  For  example,  exeeptions  for  terms  would  be  an  easy  way  to 
handle  division  by  zero  or  pattern-mateh  failures  oeeurring  during  evaluations.  At  the  same  time,  however, 
exeeptions  for  expressions  are  also  useful  for  those  instruetions  whose  exeeution  does  not  always  sueeeed. 

We  hold  the  view  that  expressions  are  in  prineiple  a  syntaetie  eategory  speeialized  for  world  effeets, 
and  allow  eontrol  effeets  in  both  terms  and  expressions.  The  deeision  does  not  prevent  us  from  developing 
eontrol  effeets  orthogonally  to  world  effeets,  sinee  eontrol  effeets  are  realized  with  reduetion  rules  whereas 
world  effeets  are  realized  with  world  struetures.  In  faet,  there  is  no  reason  to  eonfine  eontrol  effeets  only  to 
one  syntaetie  eategory,  sinee  the  eoneept  of  eontrol  effeet  is  relative  to  what  eonstitutes  the  “basie”  reduetion 
rules  anyway. 

As  an  example  of  eontrol  effeet,  we  eonsider  eontinuations.  We  eonsider  two  kinds:  one  for  terms 
and  another  for  expressions.  A  eontinuation  for  terms  denotes  an  evaluation  parameterized  over  terms;  a 
eontinuation  for  expressions  denotes  a  eomputation  parameterized  over  terms.  The  two  are  independent 
notions,  and  we  diseuss  them  separately.  Sinee  we  are  primarily  interested  in  how  eontinuations  ehange  the 
state  of  the  run-time  system,  we  foeus  on  the  operational  semanties  only;  for  the  type  system,  we  refer  the 
reader  to  the  literature  (e.g.,  [25]). 

In  the  syntax,  we  assume  value  reeursion  eonstruets  whieh  interaet  with  eontinuations  for  expres¬ 
sions  in  an  interesting  way.  Henee  we  eontinue  to  use  the  two  reduetion  judgments  M.cr  N  and 
E  @  (jj  ,a  i-^e  F  @  uj'  ,a'  in  Seetion  2.5.2  (but  in  a  different  style). 

2.6.1  Continuations  for  terms 

Figure  2.8  shows  the  syntax  for  eontinuations  for  terms.  An  evaluation  context  k  is  a  term  with  a  hole  [] 
whieh  ean  be  filled  wifh  a  ferm  M  fo  produee  anofher  ferm  k[M];  if  assumes  a  eall-by-value  diseipline. 
contt  K  liffs  an  evaluafion  eonfexf  k  fo  a  value  and  is  ealled  a  term  continuation,  calicct  and  throwt  are 
eonsfruefs  for  eapfuring  and  fhrowing  ferm  eonfinuafions,  respeefively. 

The  operafional  semanfies  in  Figure  2.9  uses  a  reduetion  judgmenf  in  fhe  form  of  k[M]  _  a  i-^t  k'[N] 
where  a  is  a  reeursion  store.  Nofe  fhaf  if  is  fhe  same  ferm  reduetion  judgmenf  as  in  Seefion  2.5.2  beeause  bofh 
k[M]  and  are  ferms.  The  rule  CTred  uses  a  ferm  reduetion  M  ^/jterm  The  rule  CTcallcc  binds 
variable  x  fo  a  ferm  eontinuation  eonfaining  fhe  eurrenf  evaluafion  eonfexf  k;  fhe  rule  CTthrow  nullifies  fhe 
eurrenf  evaluafion  eonfexf  n  fo  aefivafe  a  new  evaluafion  eonfexf  k'  . 

The  formulation  of  eonfinuafions  for  ferms  is  sfandard.  Whaf  is  interesting  is  fhaf  from  a  logieal  perspee- 
five,  eonfinuafions  for  ferms  ehange  fhe  meaning  of  A  true  from  infuifionisfie  frufh  fo  elassieal  frufh  [23]. 
The  ehange  in  fhe  meaning  of  A  true,  however,  does  nof  mean  fhaf  we  have  fo  ehange  fhe  definifion  of 
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M  13  term  N 

k[M]  _  a  i-^t  K[iV] 


CTred 


z  =  V  G  a 
k[^  .  '^t  i^\y] 


CTvvar 


^[callcct  X.  M]  _  cr  i-^t  ^[[contt  k./x]M] 


CTcallcc 


^[throwt  (contt  n')  V]  _  a  k-'[V] 


CTthrow 


Figure  2.9:  Reduction  rules  for  continuations  for  terms. 


term 

M 

'.'.=  •  •  •  conte  f 

value 

V 

'.'.=  •  •  •  conte  f 

expression 

E 

'.'.=  •  •  •  calicce  X.  E  throwe  M  E 

computation  context 

4> 

[]e  []t  letcmp  X  <]  []t  in  FI  letcmp  x  <]  cmp  fmE 

vfix,  z:A.(l)\  throwe  []t  E  \  throwe  (conte  </>)  </> 


Figure  2.10:  Syntax  for  continuations  for  expressions. 


expressions  accordingly,  since  our  definition  of  A  comp  is  not  subject  to  a  particular  definition  of  A  true. 
In  other  words,  even  if  we  change  the  meaning  of  A  true,  the  same  definition  of  A  comp  remains  valid  with 
respect  to  the  new  definition  of  A  true',  hence  the  previous  definition  of  expressions  also  remains  valid. 

2.6.2  Continuations  for  expressions 

Figure  2.10  shows  the  syntax  for  continuations  for  expressions.  A  computation  context  cj)  is  an  expression 
with  a  hole  []t  or  []e.  []t  can  be  filled  only  with  a  term,  and  []e  only  with  an  expression,  conte  (j)  lifts  a 
computation  context  0  to  a  value  and  is  called  an  expression  continuation,  calicce  and  throwe  are  constructs 
for  capturing  and  throwing  expression  continuations,  respectively. 

The  operational  semantics  in  Figure  2.11  uses  a  reduction  judgment  in  the  form  of 
(j)[E]  @  Lo  ,a  i-^e  4>'[E]  @  co' .  O''.  Note  that  it  is  the  same  expression  reduction  judgment  as  in  Section  2.5.2 
because  both  cplE]  and  (l)'[E]  are  expressions.  The  rule  CEcallcc  binds  variable  x  to  a  expression  con¬ 
tinuation  containing  the  current  computation  context  0;  the  rule  CEthrow  nullities  the  current  computation 
context  (/>  to  activate  a  new  computation  context  (j)' .  By  the  rule  CEvfixo,  a  computation  context  vfix,  z'.A.cf) 
marks  that  z  is  bound  to  a  black  hole. 

It  is  important  that  the  rule  CEvfixc  does  not  require  ^  =  V'  ^  a  in  the  premise;  if  z  =  F'  is  already 
in  a,  it  is  removed  in  a,z  =  V  (so  that  all  recursion  variables  remain  distinct).  The  reason  is  that  an 
expression  continuation  that  has  been  captured  before  the  completion  of  the  computation  of  vfix,  z:  A.  E 
may  be  thrown  after  its  completion.  In  this  case,  recursion  variable  z  is  already  bound  to  the  value  that  the 
previous  computation  of  vfix,  z'.A.E  has  returned.  We  can  exploit  this  property  to  show  that,  for  example, 
vfix  z:  A.  letcmp  x  <\M  \n  E  and  letcmp  x  <i  M  in  vfix  z:  A.  E  behave  differently  even  when  z  is  not  free  in 
M.*’ 

Consider  an  expression 


vfix  z:  A.  letcmp  x  <i  cmp  calicce  y.E'mF 

where  z  is  not  free  in  E.  The  expression  continuation  captured  by  calicce  y.  E  may  escape  the  scope  of  the 
whole  value  recursion  construct.  When  it  is  thrown  later,  z  is  already  bound  to  a  value  and  every  attempt  to 

^Erkok  and  Launchbury  [18]  call  the  equivalence  between  the  two  expressions  the  left-shrinking  property  of  value  recursion. 
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M .  a  N 


4>[M]  @  CO  ,a  i-^e  (PW]  @  .  O' 


CEtred 


CEhind 


(/)[letcmp  X  <1  cmp  F  in  i?]  @  w  .  a  i-^e  </>[[^/ic]-E]  @  w  .  cr 

CEcallcc 


i;A[callcCe  x.  E]  @  co  ,a  i-^e  i?!>[[conte  4)/x\E]  @  co  ,(t 
(/)[throwe  (conte  (l)')V]@  CO  .a  i-^e  <P'\y]  @  CO  .a 
i;A[vfix  z:yl.  E]  @  co  .a  i-^e  i;A[vfix,  z:A.E]  @  co  .  a 
(;A[vfix,  z:A.Vj  @  co  .  a  i-^e  (PW]  @  co  .a,z  =  V 


CEthrow 


CEvfixo 

CEvfixc 


Figure  2.11:  Reduction  rules  for  continuations  for  expressions. 


read  z  in  F  succeeds  without  raising  a  value  recursion  error.  This  is  not  the  case  for  the  following  expression: 

letcmp  X  <1  cmp  callccg  y.  F  in  vfix  z :  A.  F 

During  the  computation  of  F,  z  is  bound  to  a  black  hole  by  the  rule  CEvfixo.  Consequently  any  attempt  to 
read  z  in  F  results  in  a  value  recursion  error. 

In  general,  value  recursion  is  unsafe  in  the  presence  of  expression  continuations  because  a  value  recur¬ 
sion  construct  may  compute  to  a  value  containing  unresolved  recursion  variables,  that  is,  recursion  variables 
bound  to  black  holes  (the  counter-example  in  [47]  can  be  rewritten  in  Ao)-  An  error  resulting  from  reading 
an  unresolved  recursion  variable  is  similar  to  a  value  recursion  error  in  that  both  result  from  an  attempt  to 
read  a  recursion  variable  bound  to  a  black  hole.  The  difference  is  that  while  a  value  recursion  error  results 
from  a  premature  attempt  to  read  a  recursion  variable  that  will  be  eventually  bound  to  a  value,  an  unresolved 
recursion  variable  remains  bound  to  a  black  hole  forever. 


2.7  Summary 

Moggi’s  monadic  metalanguage  Xmi  [44,  45]  has  served  as  the  de  facto  standard  for  subsequent  monadic 
languages  [36,  37,  6,  70,  46,  78,  47].  Benton,  Biermann,  and  de  Paiva  [7]  show  that  from  a  type-theoretic 
perspective,  A^z  is  connected  to  lax  logic  via  the  Curry-Howard  isomorphism.  Pfenning  and  Davies  [60] 
reformulate  A^z  by  applying  Martin-Ldf ’s  methodology  of  distinguishing  between  propositions  and  judg¬ 
ments  [42]  to  lax  logic.  The  new  formulation  of  A^z  draws  a  syntactic  distinction  between  values  and  com¬ 
putations,  and  uses  the  modality  O  for  computations.  It  is  used  in  the  design  of  a  security-typed  monadic 
language  [13];  its  underlying  modal  type  theory  inspires  type  systems  in  [4,  5]  and  effect  systems  in  [51,  52]. 

The  idea  of  the  syntactic  distinction  but  without  an  explicit  modality  for  computations  is  used  by  Petersen 
et  al.  [54].  The  same  idea  is  also  used  by  Mandelbaum,  Walker,  and  Harper  [41].  Their  language  is  similar  to 
Ao  in  that  the  operational  semantics  (but  not  the  type  system)  uses  an  accessibility  relation  between  worlds. 
The  meaning  of  a  world  is,  however,  slightly  different:  a  world  in  their  language  is  a  collection  of  facts  on  a 
world  in  Aq- 

Ao  extends  the  new  formulation  of  Xmi  by  Pfenning  and  Davies  with  an  operational  semantics  to  support 
concrete  notions  of  computational  effect.  Compared  with  those  monadic  languages  based  upon  X^i,  it 
does  not  strictly  increase  the  expressive  power  —  it  is  straightforward  to  devise  a  translation  from  Ao  to 
a  typical  monadic  language  based  upon  Xmi  and  vice  versa.  In  this  regard,  the  syntactic  distinction  in 
Ao  may  be  thought  of  as  a  cosmetic  change  to  the  syntax  of  monadic  languages.  It,  however,  inspires  a 
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new  approach  to  incorporating  computational  effects  into  monadic  languages  by  allowing  control  effects 
both  in  terms  and  in  expressions  while  confining  world  effects  to  expressions.  In  a  monadic  language 
based  upon  Xmi,  this  (unorthodox)  approach  would  mean  that  its  pure  functional  sublanguage  is  allowed  to 
produce  control  effects.  The  syntactic  distinction  also  leads  to  the  interpretation  of  terms  and  expressions 
as  complete  languages  of  their  own,  which  makes  Ao  a  candidate  for  a  unified  framework  under  which  fo 
sfudy  fwo  languages  fhaf  have  fradifionally  been  sfudied  separafely:  Haskell  (corresponding  fo  terms)  and 
ML  (corresponding  fo  expressions).  Ulfimafely  we  believe  fhaf  fhe  idea  of  fhe  synfaclic  disfincfion  conveys 
a  design  principle  nol  found  in  ofher  monadic  languages. 


Chapter  3 

The  Probabilistic  Language  PTP 


This  chapter  presents  the  syntax,  type  system,  and  operational  semantics  of  PTP.  We  give  examples  to 
demonstrate  properties  of  PTP,  and  show  how  to  verify  that  a  program  correctly  encodes  a  target  probability 
distribution.  We  propose  the  Monte  Carlo  method  [40]  as  a  means  of  overcoming  a  limitation  of  PTP, 
namely  lack  of  support  for  precise  reasoning  about  probability  distributions. 

For  the  reader  who  has  read  the  previous  chapter,  PTP  may  be  viewed  as  a  simplified  account  of  Ao 
with  language  constructs  for  probabilistic  computations  in  Section  2.4.1.  A  source  of  simplification  is  that 
a  world,  which  is  an  infinite  sequence  of  random  numbers,  does  not  affect  types  of  terms  and  expressions; 
hence  typing  judgments  in  PTP  do  not  require  worlds.  The  following  table  show  judgments  in  Ao  and  their 
corresponding  judgments  in  PTP: 


Judgments  in  Aq 

ThsM  :A@u} 

r  hs  ^  T  A  @  w 

M^tN 

M 

E  @  (jJ  l-^e  F  @  to' 
E@uj^V@uj' 


Judgments  in  PTP 
r  hp  M  :  A 
rhpE^  A 
(same) 
(same) 
(same) 
(same) 


The  syntax  of  PTP  uses  type  constructors  familiar  from  programming  languages  (rather  than  logic)  and  more 
specific  keywords  specialized  fo  probabilify  disfribufions: 


Synfax  of  Aq 
Ad  B 

aab 

cmp  E 

letcmp  X  <  M  \n  E 


Synfax  of  PTP 
A^B 
Ax  B 
prob  E 

sample  x  from  M  in 


The  definition  of  PTP  in  fhis  chapfer  is  self-confained,  buf  should  be  supplemenfed  by  fhe  previous 
chapter  for  ifs  logical  foundalion. 


3.1  Definition  of  PTP 

3.1.1  Syntax  and  type  system 

PTP  augmenfs  fhe  lambda  calculus,  consisting  of  terms,  wifh  a  separate  synfaclic  category,  consisfing  of 
expressions  in  a  monadic  synfax.  Terms  denote  regular  values  and  expressions  denote  probabilisfic  compu- 
fafions.  We  say  fhaf  a  term  evaluates  fo  a  value  and  an  expression  computes  fo  a  sample. 
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type 

A,B 

::=  A^A  A  X  A  OA  real 

term 

M,N 

::=  X  1  Ax:A.M  1  MM  1  (M,M)  1  fstM 
snd  M  fix  X :  A.  M  prob  E  r 

expression 

E,E 

::=  M  sample  x  from  M  m  E  \  S 

value/sample 

V 

::=  Ax:A.M  1  (l/,y)  1  prob.E  1  r 

real  number 

r 

sampling  sequence 

UJ 

::=  rir2  •  •  •  r*  •  •  •  where  ri  G  (0.0, 1.0] 

typing  context 

r 

::=  •  r,  X  :  A 

Figure  3.1:  Abstract  syntax  for  FTP. 


T ,X  ■.  A\-a  X  ■.  A 


Hyp 


T ,x  \  A\-p  M  :  B 


rhp  Xx:A.M  :  A- 


Lam 


rhp  Mi:  rhp  Mary!  r  hp  Mi  ;  r  hp  M2  :  ^2 


r  hp  Ml  M2  :  B 

r  hp  M  :  Ai  X  A2 
r  hp  fst  M  :  hli 


App 


Fst 


rhp  (Ml,  M2)  :  Ai  X  A2 
r  hp  M  :  yli  x  A2 


Prod 


r  hp  snd  M  :  A2 


Snd 


r,  X  :  A  hp  M  :  A 
r  ho  fix  X :  A.  M  :  A 


Fix 


r  hp  ^  A 


Prob 


r  hp  M  :  A 
r  hp  M  ^  A 


Term 


r  hp  prob  E  :  OA  '  '  "  i'hp  r  :  rea 
OA  T,x  :  A\-p  E  -\r  B 


Real 


r  hp  M 


r  hp  sample  x  from  M  \n  E  ^  B 
Sampling 


Bind 


r  hn  5  -F  rea  I 


Figure  3.2:  Typing  rules  of  FTP. 


Figure  3. 1  shows  the  abstraet  syntax  for  FTP.  We  use  x  for  variables.  Ax :  A.  M  is  a  lambda  abstraetion, 
and  M  M  is  an  applieation  term.  (M,  M)  is  a  produet  term,  and  fst  M  and  snd  M  are  projeetion  terms;  we 
inelude  these  terms  to  support  joint  distributions,  fix  x :  A.  M  is  a  fixed  point  eonstruet  for  reeursive  evalu¬ 
ations.  A  probability  term  prob  E  eneapsulates  expression  E',  it  is  a  first-elass  value  denoting  a  probability 
distribution,  r  is  a  real  number. 

There  are  three  kinds  of  expressions:  term  M,  bind  expression  sample  x  from  M  in  E,  and  sampling  ex¬ 
pression  S.  As  an  expression,  M  returns  (with  probability  1)  the  result  of  evaluating  M.  sa  m pie  x  from  M  in  E 
sequenees  two  probabilistie  eomputations  (if  M  evaluates  to  a  probability  term).  S  eonsumes  a  random  num¬ 
ber  in  a  sampling  sequence,  an  infinite  sequenee  of  random  numbers  drawn  independently  from  U (0.0, 1.0]. 

The  type  system  employs  two  kinds  of  typing  judgments: 

•  Term  typing  judgment  T  hp  M  :  A,  meaning  that  M  evaluates  to  a  value  of  type  A  under  typing 
eontext  T. 

•  Expression  typing  judgment  T  hp  i?  -F  A,  meaning  that  E  eomputes  to  a  sample  of  type  A  under 
typing  eontext  T. 

A  typing  eontext  T  is  a  set  of  bindings  x  :  A.  Figure  3.2  shows  the  typing  rules  of  PTP.  The  rule  Prob 
is  the  introduetion  rule  for  the  type  eonstruetor  O;  it  means  that  type  OA  denotes  probability  distributions 
over  type  A.  The  rule  Bind  is  the  elimination  rule  for  the  type  eonstruetor  O.  The  rule  Term  means  that 
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M^t  M'  ^  N' _  y 

MN^tM'N  {Xx:A.M)  N  ^t{Xx:A.M)  N' 


M  M' 


(Ax  :A.M)V  [V/x]M  (M,  N)  (M',  N) 

N^tN'  M^tN 

r/^  ;„4.  A/f  ,  ,  f^4-  AT  ^  Fst 


Tp, 


{V,N)^i{V,N')  ^  ^  fstM^tfstTV  ht{V,V')^tV 

M^tN 


Tpst' 


snd  M  i-^t  snd  N 


Tsnd 


!  T'Snd' 


snd  {V,  V)  V' 

Tf-  M^tN 


f'\x  X :  A.  M  [V\x  X :  A.  M / x]M  M@uj^eN@u) 

_ M^tN _ 

sample  x  from  M  \n  F  @  lo  i-^e  sample  x  from  N  \n  F  @  to 

E  @  to  l-^e  E'  (0)  Oj' 

sample  x  from  prob  E  m  E  @  uj  i-^e  sample  x  from  prob  E'  \n  E  @  uj 

Eui 


ETerm 

^Bind 

7  ^BindR 


sample  x  from  prob  F  in  F  @  w  i-^e  \V/x\F  @  uj 

Sampling 


BindV 


S  @roj  i-^p  r  @  u) 


Figure  3.3:  Operational  semantics  of  FTP. 


every  term  converts  into  a  probabilistic  computation  that  involves  no  probabilistic  choice.  The  rule  Real 
shows  that  real  is  the  type  of  real  numbers.  A  sampling  expression  S  has  also  type  real,  as  shown  in  the  rule 
Sampling,  because  it  computes  to  a  real  number. 

3.1.2  Operational  semantics 

Since  FTP  draws  a  syntactic  distinction  between  regular  values  and  probabilistic  computations,  its  opera¬ 
tional  semantics  needs  two  kinds  of  judgments: 

•  Term  evaluation  judgment  M  ^  V,  meaning  that  term  M  evaluates  to  value  V. 

•  Expression  computation  judgment  E  @  uj  —r  V  @  uj' ,  meaning  that  expression  E  with  sampling  se¬ 
quence  UJ  computes  to  sample  V  with  remaining  sampling  sequence  uu'.  Conceptually  E  @  uj  —r  V  @  uj' 
consumes  random  numbers  muj  —  uj'  .  Properties  of  the  consumed  sequence  uj  —  uj'  {e.g.,  its  length) 
are  not  directly  observable. 

For  term  evaluations,  we  introduce  a  term  reduction  M  iV  in  a  call-by-value  discipline  (we  could 
equally  choose  call-by-name  or  call-by-need).  We  identify  M  V  with  M  ^  V,  where  is  the  re¬ 
flexive  and  transitive  closure  of  i-^t.  For  expression  computations,  we  introduce  an  expression  reduction 
E  @  UJ  i-^e  F  @  Lo'  such  that  E  @  uj  F  @  w'  is  identified  with  E  @  to  ^  V  @  to',  where  is  the  re¬ 
flexive  and  transitive  closure  of  i-^e-  Both  reductions  use  capture-avoiding  term  substitutions  [M/x]N  and 
[M/x]E  defined  in  a  standard  way,  as  in  Section  2.3.3. 

Figure  3.3  shows  the  reduction  rules  in  the  operational  semantics  of  PTP.  Expression  reductions  may 
invoke  term  reductions  (e.g.,  to  reduce  M  in  sample  x  from  Min  E).  The  rules  EsindR  and  EsindV  mean 
that  given  a  bind  expression  sample  x  from  prob  F  in  F,  we  finish  computing  E  before  substituting  a  value 
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for  X  in  F.  Note  that  like  a  term  evaluation,  an  expression  eomputation  itself  is  deterministie;  it  is  only  when 
we  vary  sampling  sequenees  that  an  expression  exhibits  probabilistie  behavior. 

An  expression  eomputation  E  @  lo  i— V  @  uj'  means  that  E  takes  a  sampling  sequenee  uj,  eonsumes  a 
finite  prefix  of  uj  in  order,  and  refurns  a  sample  V  wifh  fhe  remaining  sampling  sequenee  uj': 

Proposition  3.1.  If  E  @  uj  V  @  uj',  then  uj  =  rir2  ■  ■  ■  VnOj'  (n>  0)  where 

E  @UJU^I---U^l  Ei@  n+i  •  •  •  rnUj'  ■■■  En@Uj'  ^IV  @uj' 

for  a  sequence  of  expressions  Ei,  -  ■  ■  ,  E^- 

Thus  an  expression  eompufafion  eoineides  wifh  fhe  operafional  deseripfion  of  a  sampling  funefion  when 
applied  fo  a  sampling  sequenee,  whieh  implies  fhaf  an  expression  represenfs  a  sampling  function.  (Here  we 
use  a  generalized  notion  of  sampling  function  mapping  (0.0, 1.0]°°  fo  A  x  (0.0, 1.0]°°  for  a  cerfain  fype  A.) 

The  fype  safefy  of  PTP  consisfs  of  fwo  properfies:  fype  preservafion  and  progress.  Their  proofs  are 
omiffed  as  fhey  are  special  cases  of  Theorems  2.8  and  2. 10,  excepf  for  S  which  safisfies  fhe  fype-preservafion 
and  monofonicify  requiremenfs  on  insfrucfions. 

Theorem  3.2  (Type  preservation). 

If  M  El  and  ■  hp  M  :  A,  then  ■  \-p  N  :  A. 

If  E  @  UJ  i-^e  F  @  uj'  and  ■  \-p  E  A,  then  ■  \-p  F  A. 

Theorem  3.3  (Progress). 

If  ■  hp  M  :  A,  then  either  M  is  a  value  (i.e.,  M  =  V),  or  there  exists  N  such  that  M  i-^t  N- 
If  -y-p  E  ^  A,  then  either  E  is  a  sample  (i.e.,  E  =  V),  or  for  any  sampling  sequence  uj,  there  exist  E 
and  uj'  such  that  E  @  uo  i-^e  E  @  uj'. 

3.1.3  Fixed  point  construct  for  expressions 

In  PTP,  expressions  describe  non-recursive  probabilistic  compufafions.  Since  some  probability  distributions 
are  defined  in  a  recursive  way  {e.g.,  geometric  distributions),  it  is  desirable  to  be  able  to  describe  recursive 
probabilistic  computations  as  well.  To  this  end,  we  introduce  an  expression  variable  x  and  an  expression 
fixed  point  construct  efix  x-^  A.  i?;  a  new  form  of  binding  x  A  for  expression  variables  is  used  in  typing 
contexts: 

expression  E  :;=  •••  j  x  j  efix  x-^A.  i? 

typing  context  T  ::=  •  •  •  j  T,  x  A 

New  typing  rules  and  reduction  rule  are  as  follows: 


r,  X A  hp  i? A 

r,  X  ^  A  hp  X  ^  A  Thp  efixx^A..E^  A 

_ 

efix  x-^A.  E  @  UJ  i-^e  [efix  x-^A.  E/x]E  @  uj 

In  the  rule  Efix,  [efix  x-^A.  E/x]E  denotes  a  capture-avoiding  substitution  of  efix  x-^A.  E  for  expression 
variable  x. 

Expression  fixed  point  constructs  are  syntactic  sugar  as  they  can  be  simulated  with  fixed  point  constructs 
for  terms.  See  Section  2.5.1  for  details. 
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3.1.4  Distinguishing  terms  and  expressions 

The  syntactic  distinction  between  terms  and  expressions  in  FTP  is  optional  in  the  sense  that  the  grammar 
does  not  need  to  distinguish  expressions  as  a  separate  non -terminal.  On  the  other  hand,  the  semantic  dis¬ 
tinction,  both  statically  (in  the  form  of  term  and  expression  typing  judgments)  and  dynamically  (in  the  form 
of  evaluation  and  computation  judgments)  appears  to  be  essential  for  a  clean  formulation  of  FTP. 

FTP  is  a  conservative  extension  of  a  conventional  language  because  terms  constitute  a  conventional 
language  of  their  own.  In  essence,  term  evaluations  are  always  deterministic  and  we  need  only  terms  when 
writing  deterministic  programs.  As  a  separate  syntactic  category,  expressions  provide  a  framework  for 
probabilistic  computation  that  abstracts  from  the  definition  of  terms.  For  example,  the  addition  of  a  new 
term  construct  does  not  change  the  definition  of  expressions.  When  programming  in  FTP,  therefore,  the 
syntactic  distinction  between  terms  and  expressions  aids  us  in  deciding  which  of  deterministic  evaluations 
and  probabilistic  computations  we  should  focus  on.  In  the  next  section,  we  show  how  to  encode  various 
probability  distributions  and  further  investigate  properties  of  FTP. 


3.2  Examples 

When  encoding  a  probability  distribution  in  FTP,  we  naturally  concentrate  on  a  method  of  generating  sam¬ 
ples,  rather  than  calculating  the  probability  assigned  to  each  event.  If  the  probability  distribution  itself  is 
defined  in  terms  of  a  process  of  generating  samples,  we  simply  translate  the  definition.  If,  however,  the 
probability  distribution  is  defined  in  ferms  of  a  probabilify  measure  or  an  equivalenf,  we  may  nof  always  de¬ 
rive  a  sampling  function  in  a  mechanical  manner.  Instead  we  have  fo  exploif  ifs  unique  properties  fo  devise 
a  sampling  funclion. 

Below  we  show  examples  of  encoding  various  probabilify  disfribufions  in  FTP.  These  examples  demon- 
sfrafe  fhree  properties  of  PTP:  a  unified  represenfafion  scheme  for  probabilify  disfribufions,  rich  expressive¬ 
ness,  and  high  versafilify  in  encoding  probabilify  disfribufions.  The  sampling  mefhods  used  in  fhe  examples 
are  all  found  in  simulafion  fheory  [10].  Thus  PTP  is  a  programming  language  in  which  sampling  mefhods 
developed  in  simulation  fheory  can  be  formally  expressed  in  a  fashion  fhaf  is  concise  and  readable  while 
remaining  as  efficienl  as  fhe  originals. 

We  assume  primitive  types  int  and  bool  (wifh  boolean  values  True  and  False),  arithmetic  and  comparison 
operators,  and  a  conditional  term  construct  if  M  then  Ni  else  N2.  We  also  assume  standard  let-binding,  re¬ 
cursive  let  rec-binding,  and  pattern  matching  when  it  is  convenient  for  the  examples.^  We  use  the  following 
syntactic  sugar  for  expressions: 

unprobM  =  sample  x  from  M  in  x 
eif  M  then  Si  else  £^2  =  unprob  (if  M  then  prob  Si  else  prob  S2) 

unprob  M  chooses  a  sample  from  the  probability  distribution  denoted  by  M  (we  choose  the  keyword  unprob 
to  suggest  that  it  does  the  opposite  of  what  prob  does.)  eif  M  then  Si  else  S2  branches  to  either  Si  or  S2 
depending  on  the  result  of  evaluating  M. 


'if  type  inference  and  polymorphism  are  ignored,  let-binding  and  recursive  let  rec-binding  may  be  interpreted  as  follows,  where 
_  is  a  wildcard  pattern  for  types: 


let  a;  =  M  in  A 
let  rec  x  =  M  \n  N 


(Ax:..  A)  M 

let  X  =  fix  X :  M  in  A 
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Unified  representation  scheme 

PTP  provides  a  unified  representation  scheme  for  probability  disttibutions.  While  its  type  system  distin¬ 
guishes  between  different  probability  domains,  its  operational  semantics  does  not  distinguish  between  dif¬ 
ferent  kinds  of  probability  distributions,  such  as  discrete,  continuous,  or  neither.  We  show  an  example  for 
each  case. 

We  encode  a  Bernoulli  disttibution  over  type  bool  with  parameter  p  as  follows: 

let  bernoulli  =  Xp:  real,  prob  sample  x  from  prob  S  in 

X  <  p 

bernoulli  can  be  thought  of  as  a  binary  choice  consttuct.  It  is  expressive  enough  to  specify  any  discrete 
distribution  with  finite  support.  In  fact,  bernoulli  0.5  suffices  to  specify  all  such  probability  distributions, 
since  it  is  capable  of  simulating  a  binary  choice  construct  [21]  (if  the  probability  assigned  to  each  element 
in  the  domain  is  computable). 

As  an  example  of  continuous  distribution,  we  encode  a  uniform  distribution  over  a  real  interval  (a,  b]  by 
exploiting  the  definition  of  the  sampling  expression: 

let  uniform  =  Xa :  real.  A6:  real,  prob  sample  x  from  prob  S  in 

a  +  X  *  {b  —  a) 

We  also  encode  a  combination  of  a  point-mass  distribution  and  a  uniform  distribution  over  the  same  domain, 
which  is  neither  a  discrete  distribution  nor  a  continuous  distribution: 

let  point_uniform  =  prob  sample  x  from  prob  S  in 

if  X  <  0.5  then  0.0  else  x 


Rich  expressiveness 

We  now  demonstrate  the  expressive  power  of  PTP  with  a  number  of  examples. 

We  encode  a  binomial  distribution  with  parameters  p  and  no  by  exploiting  probability  terms: 

let  binomial  =  Ap:  real.  Ano :  int. 
let  bernoullip  =  bernoulli  p  in 
let  rec  binomialp  =  An:int. 
if  n  =  0  then  prob  0 

else  prob  sample  x  from  binomialp  (n  —  1)  in 
sample  b  from  bernoullip  in 
if  b  then  1  -|-  x  else  x 
in 

binomialp  no 

Here  binomialp  takes  an  integer  n  as  input  and  returns  a  binomial  distribution  with  parameters  p  and  n. 

If  a  probability  distribution  is  defined  in  terms  of  a  recursive  process  of  generating  samples,  we  can  trans¬ 
late  the  definition  into  a  recursive  term.  For  example,  we  encode  a  geometric  distribution  with  parameter  p, 


57 


which  is  a  discrete  distribution  with  infinite  support,  as  follows: 

let  geometric -rec  =  Ap:real. 
let  bernoullip  =  bernoulli  p  in 

let  rec  geometric  =  prob  sample  b  from  bernoullip  in 

eif  b  then  0 

else  sample  x  from  geometric  in 
1  +  X 

in 

geometric 

Here  we  use  a  recursive  term  geometric  of  type  Oint.  Equivalently  we  can  use  an  expression  fixed  poinf 
consfrucf: 

let  geometric_efix  =  Ap:  real,  let  bernoullip  =  bernoulli  p  in 

prob  efix  geometric -^int. 

sample  6  from  bernoullip  in 
eif  b  then  0 

else  sample  x  from  prob  geometric  in 

1  +  X 

We  encode  an  exponential  distribution  by  using  the  inverse  of  its  cumulative  distribution  function  as  a 
sampling  function,  which  is  known  as  the  inverse  transform  method: 

let  exponentiali  Q  =  prob  sample  x  from  S  in 

—  log  X 

The  rejection  method,  which  generates  a  sample  from  a  probability  distribution  by  repeatedly  generating 
samples  from  other  probability  distributions  until  they  satisfy  a  certain  termination  condition,  can  be  imple¬ 
mented  with  a  recursive  term.  For  example,  we  encode  a  Gaussian  distribution  with  mean  m  and  variance 
cr^  by  the  rejection  method  with  respect  to  exponential  distributions: 

let  bernoullio,^  =  bernoulli  0.5 

let  gaussian -rejection  =  Am :  real.  Act  :  real. 

let  rec  gaussian  =  prob  sample  yi  from  exponentiali  Q 

sample  y2  from  exponentiali  Q 
eif  y2  >  (2/1  —  1. 0)^/2. 0  then 
sample  6  from  bernoulliQ,^  in 
if  b  then  m  +  a  *  yi  else  m  —  a  *  yi 
else  unprob  gaussian 


gaussian 

Since  the  probability  p  of  r/2  >  (2/1  —  1. 0)^/2. 0  (the  termination  condition)  is  positive,  the  rejection  method 
above  terminates  with  probability  p  +  (1  —  p)p  +  (1  —  p)^p  +  •  •  •  =  =  1-  In  this  way,  programmers 

can  ensure  that  a  particular  sampling  strategy  by  the  rejection  method  terminates  with  probability  1. 

We  encode  the  joint  distribution  between  two  independent  probability  distributions  using  a  product  term. 
If  Mp  denotes  P{x)  and  Mq  denotes  Q{y),  the  following  term  denotes  the  joint  distribution  Prob{x,  y)  oc 
P{x)Q{y): 

prob  sample  X  from  Mp  in 
sample  y  from  Mq  in 

{x,y) 
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For  the  joint  distribution  between  two  interdependent  probability  distributions,  we  use  a  eonditional 
probability,  whieh  we  represent  as  a  lambda  abstraetion  taking  a  regular  value  and  returning  a  probability 
distribution.  If  Mp  denotes  P{x)  and  Mq  denotes  a  eonditional  probability  Q{y\x),  the  following  term 
denotes  the  joint  distribution  Prob{x,y)  oc  P{x)Q{y\x)\ 

prob  sample  X  from  Mp  in 
sample  y  from  Mq  x  in 

{x,y) 

By  returning  y  instead  of  (x,  y),  we  eompute  the  integration  Prob{y)  =  f  P(x)Q(ylx)dx: 

prob  sample  X  from  Mp  in 
sample  y  from  Mq  x  in 

y 

Due  to  laek  of  semantie  eonstraints  on  sampling  funetions,  we  ean  speeify  probability  distributions  over 
unusual  domains  sueh  as  infinite  data  struetures  {e.g.,  trees),  funetion  spaees,  eyelie  spaees  {e.g.,  angular 
values),  and  even  probability  distributions  themselves.  For  example,  we  add  two  probability  distributions 
over  angular  values  in  a  straightforward  way: 

let  add_angle  =  Aai :  Oreal.  Aa2  :  Oreal.  prob  sample  si  from  oi  in 

sample  S2  from  02  in 
(si  +  S2)  rnod  (2.0  >1=  tt) 

With  the  modulo  operation  mod,  we  take  into  aeeount  the  faet  that  an  angle  9  is  identified  with  9  +  27r. 

As  a  simple  applieation,  we  implement  a  belief  network  [66] : 


We  assume  that  Paiarm\burgiary  denotes  the  probability  distribution  that  the  alarm  goes  off  when  a  burglary 
happens;  other  variables  of  the  form  P.|.  are  interpreted  in  a  similar  way. 

let  alarm  =  X{burglary ,  earthquake) :  bool  x  bool, 
if  burglary  then  Palarm\burglary 
else  if  earthquake  then  Palarm\^burglary/\earthquake 
else  P alarm \  —< burglary earthquake 

let  johri-calls  =  Xalarm :  bool, 
if  alarm  then  Pjohn_calls\alarm 
else  Pjohn.calls\^alarm 

let  mary_calls  =  Xalarm :  bool, 
if  alarm  then  PMary.calls\alarm 
else  P M ary .calls\—< alarm 
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The  conditional  probabilities  alarm,  johri-calls,  and  mary_calls  do  not  answer  any  query  on  the 
belief  network  and  only  describe  its  structure.  In  order  to  answer  a  specific  query,  we  have  to  imple¬ 
ment  a  corresponding  probability  distribution.  For  example,  in  order  to  answer  “What  is  the  probability 
PMary.caiis\john.caiis  ^at  Mary  calls  when  John  calls?”,  we  use  Q Mary.caiis\john.caiis  below,  which  essen¬ 
tially  implements  logic  sampling  [26] : 

let  rec  Q Mary.caiis\john.caiis  =  prob  Sample  b  from  Pburgiary  in 

sample  6  from  Pearthquake  m 
sample  a  from  alarm  {b,  e)  in 
sample  j  from  johri-calls  a  in 
sample  m  from  mary_calls  a  in 
eif  j  then  m  else  unprob  QMary.caiis\  John^calls 
in 

Q  Mary.calls  \  John.calls 


Pburgiary  dcnotcs  the  probability  distribution  that  a  burglary  happens,  and  Pearthquake  the  probability  distri- 
bution  that  an  earthquake  happens.  Then  the  mean  of  QMary.caiis\john.caiis  gives  p Mary. calls]  John.caiis-  We 
will  see  how  to  calculate  p Mary. calls]  John.caiis  in  Section  3.4. 

We  can  also  implement  most  of  the  common  operations  on  probability  distributions.  An  exception  is 
the  Bayes  operation  (j  (which  is  used  in  the  second  update  equation  of  the  Bayes  filter).  P  Q  results  in 
a  probability  distribution  R  such  that  R{x)  =  r]P{x)Q{x)  where  r/  is  a  normalization  constant  ensuring 
f  R{x)dx  =  1.0;  if  P{x)Q{x)  is  zero  for  every  x,  then  P  (j  Q  is  undefined.  Since  it  is  difficult  to  achieve 
a  general  implementation  of  P  (j  Q,  we  usually  make  an  additional  assumption  on  P  and  Q  to  achieve 
a  specialized  implementation.  For  example,  if  we  have  a  function  p  and  a  constant  c  such  that  p{x)  = 
kP{x)  <  c  for  a  certain  constant  k,  we  can  implement  P  (j  Q  by  the  rejection  method: 


let  bay es. rejection  =  Ap :  Aureal.  Ac: real.  AQ^OA. 
let  rec  bayes  =  prob  sample  x  from  Q  in 

sample  u  from  prob  S  in 
eif  u  <  {p  x)/c  then  x  else  unprob  bayes 
in 

bayes 


We  will  see  another  implementation  in  Section  3.4. 


High  versatility 

FTP  allows  high  versatility  in  encoding  probability  distributions:  given  a  probability  distribution,  we  can 
exploit  its  unique  properties  and  encode  it  in  many  different  ways.  For  example,  exponential  i  q  uses  a 
logarithm  function  to  encode  an  exponential  distribution,  but  there  is  also  an  ingenious  method  (due  to  von 
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Neumann)  that  requires  only  addition  and  subtraetion  operations: 

let  exponential _v on -Neumanrii  Q  = 

let  rec  search  =  \k :  real.  Art :  real.  Atti :  real, 
prob  sample  rt' from  prob  5  in 
eif  u  <  u'  then  k  +  ui 
else 

sample  u  from  prob  S  in 

eif  u  <  u'  then  unprob  {search  k  u  ui) 

else 

sample  u  from  prob  S  in 
unprob  {search  {k  +  1.0)  u  u) 
in 

prob  sample  u  from  prob  S  in 
unprob  {search  0.0  u  u) 

The  reeursive  term  in  gaussian-rejection  eonsumes  at  least  three  random  numbers.  We  ean  eneode  a 
Gaussian  distribution  with  only  two  random  numbers: 

let  gaussian-Box -Muller  =  Am :  real.  \a :  real, 
prob  sample  u  from  prob  5  in 
sample  v  from  prob  S  in 

m  +  a  *  -y/— 2.0  *  log  u  *  cos  (2.0  *  n  *  v) 

We  ean  also  approximate  a  Gaussian  distribution  by  exploiting  the  eentral  limit  theorem: 

let  gaussian-central  =  Am:  real.  Acr:real. 
prob  sample  xi  from  prob  S  in 
sample  X2  from  prob  S  in 

sample  xi2  from  prob  S  in 

m  +  a  *  {xi  +  X2  - h  xi2  -  6.0) 

The  three  examples  above  serve  as  evidenee  of  high  versatility  of  FTP:  the  more  we  know  about  a 
probability  distribution,  the  better  we  can  encode  it. 

All  the  examples  in  this  seetion  just  rely  on  our  intuition  on  sampling  funetions  and  do  not  aetually  prove 
the  eorreetness  of  eneodings.  For  example,  we  still  do  not  know  if  bernoulli  indeed  eneodes  a  Bernoulli 
distribution,  or  equivalently,  if  the  expression  in  it  generates  True  with  probability  p.  In  the  next  seetion,  we 
investigate  how  to  formally  prove  the  eorreetness  of  eneodings. 

3.3  Proving  the  correctness  of  encodings 

When  programming  in  FTP,  we  often  ask  “What  probability  distribution  characterizes  outcomes  of  comput¬ 
ing  a  given  expression?”  The  operational  semanties  of  FTP  does  not  direetly  answer  this  question  beeause 
an  expression  eomputation  returns  only  a  single  sample  from  a  eertain,  yet  unknown,  probability  distribu¬ 
tion.  Therefore  we  need  a  different  methodology  for  interpreting  expressions  direetly  in  terms  of  probability 
distributions. 

We  take  a  simple  approaeh  that  appeals  to  our  intuition  on  the  meaning  of  expressions.  We  write  E  ~ 
Prob  if  outeomes  of  eomputing  E  are  distributed  aeeording  to  Prob.  To  determine  Prob  from  E,  we 
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supply  an  infinite  sequence  of  independent  random  variables  from  [7(0.0, 1.0]  and  analyze  the  result  of 
computing  E  in  terms  of  these  random  variables.  If  E  ^  Prob,  then  E  denotes  a  probabilistic  computation 
for  generating  samples  from  Prob  and  we  regard  Prob  as  the  denotation  of  prob  E. 

We  illustrate  the  above  approach  with  a  few  examples.  In  each  example,  Ri  means  the  i-th  random 
variable  and  means  the  infinite  sequence  of  random  variables  beginning  from  Ri  {i.e.,  RiRi+i  •  •  •  )■  A 
random  variable  is  regarded  as  a  value  because  it  represents  real  numbers  in  (0.0,  l.Oj. 

As  a  trivial  example,  consider  prob  S.  The  computation  of  S  proceeds  as  follows: 

S@R^v^eRl@  Rf 

Since  the  outcome  is  a  random  variable  from  U (0.0, 1.0],  we  have  S  ~  [7(0.0, 1.0]. 

As  an  example  of  discrete  distribution,  consider  bernoulli  p.  The  expression  in  it  computes  as  follows: 

sample  x  from  prob  S  \n  x  <  p  @  R’^ 

\—>-e  sample  x  from  prob  Ri  \n  x  <  p  @  R^ 
l-^e  Rl  <P  @  R^ 

True  @  if  Ri  <  p; 

False  @  R"^  otherwise. 

Since  Ri  is  a  random  variable  from  [7(0.0, 1.0],  the  probability  of  Ri  <  p  is  p.  Thus  the  outcome  is  True 
with  probability  p  and  False  with  probability  1.0  —  p,  and  bernoulli  p  denotes  a  Bernoulli  distribution  with 
parameter  p. 

As  an  example  of  continuous  distribution,  consider  uniform  a  b.  The  expression  in  it  computes  as 
follows: 

sample  x  from  prob  5  in  a  +  x  *  (6  —  a)  @  R^ 
a  +  Ri  *  {b  —  a)  @ 

Since  we  have 

a  +  Ri*{b-a)  e{ao,bo]  iff  Ri  £  ('^^°_  ^ ^], 
the  probability  that  the  outcome  lies  in  (oq,  bo]  is 


6q  —  a  CLo  — 

b  —  a  b  —  a 


bp  —  Qq 
b  —  a 


(xbo-ap 


where  we  assume  (oq,  bp]  C  (a,  b].  Thus  uniform  a  b  denotes  a  uniform  distribution  over  (a,  b]. 

The  following  proposition  shows  that  binomial  p  n  denotes  a  binomial  distribution  with  parameters  p 
and  n,  which  we  write  as  Binomialp^. 


Proposition  3.4.  If  binomialp  n  prob  Ep^n,  then  Ep^n  ~  Binomialp^n- 


Proof.  By  induction  on  n. 

Base  case  n  =  0.  We  have  Ep^n  =  0.  Since  Binomialp^n  is  a  point-mass  distribution  centered  on  0,  we 
have  Ep^n  ~  Binomialp^n- 
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Inductive  case  n  >  0.  The  computation  of  Ep^n  proceeds  as  follows: 

sample  x  from  binomialp  (n  —  1)  in 
sample  b  from  bernoullip  in 
if  b  then  \  +  x  else  x  @ 

sample  x  from  prob  Xp^n-i  in 
sample  b  from  bernoullip  in 
if  6  then  1  +  X  else  X  @ 

i-^g  sample  b  from  prob  bp  in 

if  b  then  1  +  Xp^n-i  else  Xp^n-i  ® 

1-^*  1  +  Xp,n-1  @  if  bp  =  True; 

Xp_n-i  @  ^i+1  otherwise. 

By  induction  hypothesis,  binomialp  (n  —  1)  generates  a  sample  Xp^n-i  from  Binomialp^n-i  after  consum¬ 
ing  •  •  •  Ri-i  for  some  i  (which  is  actually  n).  Since  Ri  is  an  independent  random  variable,  bernoullip 
generates  a  sample  bp  that  is  independent  of  Xp^n-i-  Then  we  obtain  an  outcome  k  with  the  probability  of 
bp  =  True  and  Xp^^-i  =  k  —  lor 
bp  =  False  and  Xp^n-i  =  k, 

which  is  equal  to  p  *  Binomialp^n-i{k  —  1)  +  {l.O  —  p)  *  Binomialp^n-i{k)  =  Binomial p^n{k).  Thus  we 
have  Ep^ri  ~  Binomialp^n-  D 

As  a  final  example,  we  show  that  geometric-rec  p  denotes  a  geometric  distribution  with  parameter  p. 
Suppose  geometric  prob  E  and  E  ~  Prob.  The  computation  of  E  proceeds  as  follows: 

E  @R^ 

sample  b  from  prob  bp  in 

eif  b  then  0 

else  sample  x  from  geometric  in  @  R’f 

1  +  X 

0  @  Rf"  if  bp  =  True; 

sample  X  from  prob  S  in  1  +  X  @  R^  otherwise. 

The  first  case  happens  with  probability  p  and  we  get  Prob{ll)  =  p.  In  the  second  case,  we  compute  the 
same  expression  E  with  R’f.  Since  all  random  variables  are  independent,  Rif  can  be  thought  of  as  a  fresh 
sequence  of  random  variables.  Therefore  the  computation  of  E  with  Rf  returns  samples  from  the  same 
probability  distribution  Prob  and  we  get  Prob{l  +  k)  =  (1.0  —  p)  *  Prob{k).  Solving  the  two  equations, 
we  get  Prob{k)  =  p  *  (1.0  —  p  f~^,  which  is  the  probability  mass  function  for  a  geometric  distribution  with 
parameter  p. 

The  above  approach  can  be  thought  of  as  an  adaption  of  the  methodology  established  in  simulation 
theory  [10].  The  proof  of  the  correctness  of  a  sampling  method  in  simulation  theory  is  easily  transcribed 
into  a  proof  similar  to  those  shown  in  this  section  by  interpreting  random  numbers  in  simulation  theory 
as  random  variables  in  FTP.  Thus  FTP  serves  as  a  programming  language  in  which  sampling  methods 
developed  in  simulation  theory  can  be  not  only  formally  expressed  but  also  formally  reasoned  about.  All 
this  is  possible  in  part  because  an  expression  computation  in  FTP  is  provided  with  an  infinite  sequence  of 
random  numbers  to  consume,  or  equivalently,  because  of  the  use  of  generalized  sampling  functions  as  the 
mathematical  basis. 

An  alternative  approach  would  be  to  develop  a  denotational  semantics  based  upon  measure  theory  [65] 
by  translating  expressions  into  a  measure-theoretic  structure.  Such  a  denotational  semantics  would  be  useful 
in  answering  such  questions  as: 
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•  Does  every  expression  in  PTP  result  in  a  measurable  sampling  funetion?  Or  is  it  possible  to  write  a 
pathologieal  expression  that  eorresponds  to  no  measurable  sampling  funetion? 

•  Does  every  expression  in  PTP  define  a  probability  distribution?  Or  is  it  possible  to  write  a  pathologieal 
expression  that  defines  no  probabilify  disfribufion? 

If  we  ignore  fixed  point  constructs  of  PTP,  if  is  sfraighfforward  fo  franslafe  expressions  even  direefly 
info  probabilify  measures,  sinee  probabilify  measures  form  a  monad  [22,  64]  and  expressions  already  follow 
a  monadie  synfax;  a  sampling  expression  S  is  franslafed  info  a  Lebesgue  measure  over  fhe  unif  interval 
(0.0, 1.0].  Lef  us  write  [Mjterm  for  fhe  denofafion  of  term  M.  Then  we  ean  franslafe  eaeh  expression  E  info 
a  probabilify  measure  [T^jexp  as  follows: 

•  [prob  T/jterm  —  [f^jexp- 

•  [MlexpCS")  =  1  if  [Mjterm  is  in  S. 

[M]exp(5')  =  0  if  [M]term  is  nof  in  S. 

•  [sample  x  from  M  in  T^jexp  =  /  /c?[M]term  where  a  funelion  /  is  defined  as  f{x)  =  [£^]exp  and 
f  /d [Mjterm  is  an  integral  of  /  over  measure  [Mjterm- 

•  [iSjexp  is  a  Lebesgue  measure  over  fhe  unif  interval  (0.0, 1.0]. 

Nofe  fhaf  fhe  franslafion  does  nof  immediafely  reveal  fhe  probabilify  measure  eorresponding  fo  a  given 
expression  beeause  if  refurns  n formula  for  fhe  probabilify  measure  rafher  fhan  fhe  probabilify  measure  ifself. 
Henee,  in  order  fo  obfain  fhe  probabilify  measure,  we  have  fo  go  fhrough  essenfially  fhe  same  analysis  as 
in  fhe  above  approaeh.  Ulfimafely  we  have  fo  inverf  a  sampling  funelion  represenfed  by  a  given  expression 
(because  an  evenl  is  assigned  a  probability  proporfional  fo  fhe  size  of  ifs  inverse  image  under  fhe  sampling 
function),  which  may  nof  be  easy  fo  do  in  a  mechanical  way  in  fhe  presence  of  various  operators. 

Once  we  add  fixed  poinl  conslrucls  to  PTP,  expressions  should  be  franslafed  into  a  domain-lheorelic 
slruclure  because  of  recursive  equations.  Specifically  a  ferm  fix  x :  OA.  M  gives  rise  to  a  recursion  equation 
on  fype  OA,  and  if  a  measure-fheorefic  slruclure  is  used  for  fhe  denofafion  of  terms  of  type  OA,  it  is 
difficult  to  solve  the  recursive  equation;  only  with  a  domain-theoretic  structure,  the  recursive  equation  can 
be  given  a  theoretical  treatment.  The  work  by  Jones  [30]  suggests  that  such  a  domain-theoretic  structure 
could  be  constructed  from  a  domain-theoretic  model  of  real  numbers  [17],  and  we  leave  the  development  of 
a  denotational  semantics  of  PTP  based  upon  domain  theory  as  future  work. 

3.4  Approximate  Computation  in  PTP 

We  have  explored  both  how  to  encode  probability  distributions  in  PTP  and  how  to  interpret  PTP  in  terms 
of  probability  distributions.  In  this  section,  we  discuss  another  important  aspect  of  probabilistic  languages: 
reasoning  about  probability  distributions. 

The  expressive  power  of  a  probabilistic  language  is  an  important  factor  affecting  its  practicality.  Another 
important  factor  is  its  support  for  reasoning  about  probability  distributions  to  determine  their  properties.  In 
other  words,  it  is  important  not  only  to  be  able  to  encode  various  probability  distributions  but  also  to  be 
able  to  determine  their  properties  such  as  means,  variances,  and  probabilities  of  specific  evenfs.  Unforfu- 
nafely  PTP  does  nof  supporf  precise  reasoning  abouf  probabilify  disfribufions.  Thaf  is,  if  does  nof  permif 
a  precise  implemenfafion  of  queries  on  probabilify  disfribufions.  Infuifively  we  musf  be  able  fo  calculafe 
probabilifies  of  specific  evenfs,  buf  fhis  is  fanfamounf  fo  inverting  sampling  funclions.  Hence,  for  example, 
we  cannof  calculafe  PMary  caiis\john  calls  iri  the  belief  nefwork  example  in  Section  3.2  unless  we  analyze 
QMary  caiis\john  calls  to  compufc  ifs  mean  in  a  similar  way  fo  fhe  previous  section. 
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Given  that  we  eannot  hope  for  preeise  reasoning  in  PTP,  we  ehoose  to  support  approximate  reasoning  by 
the  Monte  Carlo  method  [40].  It  approximately  answers  a  query  on  a  probability  distribution  by  generating 
a  large  number  of  samples  and  then  analyzing  them.  For  example,  we  ean  approximate  PMary  caiis\john  calls ^ 
whieh  is  equal  to  the  proportion  of  True’s  among  an  infinite  number  of  samples  from  Qj^ary  caiis\john  calls’ 
by  generating  a  large  number  of  samples  and  eounting  the  number  of  True’s.  Although  the  Monte  Carlo 
method  gives  only  an  approximate  answer,  its  aeeuraey  improves  with  the  number  of  samples.  Moreover  it 
is  applieable  to  all  kinds  of  probability  distributions  and  is  therefore  partieularly  suitable  for  PTP. 

In  this  seetion,  we  use  the  Monte  Carlo  method  to  implement  the  expeetation  query.  We  also  show 
how  to  exploit  the  Monte  Carlo  method  in  implementing  the  Bayes  operation.  Both  implementations  are 
provided  as  primitive  eonstruets  of  PTP. 


3.4.1  Expectation  query 

Among  common  queries  on  probability  distributions,  the  most  important  is  the  expectation  query.  The 
expectation  of  a  function  /  with  respect  to  a  probability  distribution  P  is  the  mean  of  /  over  P,  which  we 
write  as  f  fdP.  Other  queries  may  be  derived  as  special  cases  of  the  expectation  query.  For  example,  the 
mean  of  a  probability  distribution  over  real  numbers  is  the  expectation  of  an  identity  function;  the  probability 
of  an  event  Event  under  a  probability  distribution  P  is  f  lEventdP  where  lEvent{x)  is  1  if  x  is  in  Event 
and  0  if  not. 

The  Monte  Carlo  method  states  that  we  can  approximate  /  fdP  with  a  set  of  samples  Vi,  •  •  •  ,Vn  from 

li,n  /(n)  +  -''+/(K.)  _ 

n— ^oo  fi 

We  introduce  a  term  construct  expectation  which  exploits  the  above  equation: 


term  M  ::=  •••  |  expectation  Mj  Mp 


T  Iq,  Mf  :  A — >  real  F  Iq,  Mp  :  OA 
r  Hp  expectation  Mf  Mp  :  real 


Exp 


Mf  /  Mp  prob  Ep 

for  i  =  1,  •  •  •  ,n  new  sampling  sequence  tUj  Ep  @  tOi  Vi  @  to'^  f  Vi  Vi 

- ^ -  Exp 

expectation  Mf  Mp  i-^t 

The  rule  Exp  says  that  if  Mf  evaluates  to  a  lambda  abstraction  denoting  /  and  Mp  evaluates  to  a  prob¬ 
ability  term  denoting  P,  then  expectation  Mf  Mp  reduces  to  an  approximation  of  J  fdP.  A  run-time 
variable  n  (to  be  chosen  by  programmers)  specifies  fhe  number  of  samples  fo  generafe  from  P.  To  eval¬ 
uate  expectation  Mf  Mp,  the  run-time  system  initializes  sampling  sequence  iVi  to  generate  sample  Vi  for 
i  =  1,  -  ■  ■  ,  n  (as  indicated  by  new  sampling  sequence  coi). 

In  the  rule  Exp,  the  accuracy  of  ^  ‘  is  controlled  not  by  PTP  but  solely  by  programmers.  That  is,  PTP 
is  not  responsible  for  choosing  a  value  of  n  (e.g.,  by  analyzing  Ep)  to  guarantee  a  certain  level  of  accuracy 
in  estimating  f  fdP.  Rather  it  is  programmers  that  decide  a  suitable  value  of  n  to  achieve  a  desired  level 
of  accuracy  (as  well  as  an  expression  Ep  for  encoding  P).  Programmers  are  also  allowed  to  pick  up  a 
particular  value  of  n  for  each  expectation  query,  rather  than  using  the  same  value  of  n  for  all  expectation 
queries.  We  do  not  consider  this  as  a  weakness  of  PTP,  since  Ep  itself,  chosen  by  programmers,  affects  the 
accuracy  of  after  all. 

Although  PTP  provides  no  concrete  guidance  in  choosing  a  value  of  n  in  the  rule  Exp,  programmers 
can  empirically  determine  a  suitable  value  of  n,  namely  the  largest  value  of  n  that  finishes  an  expectation 
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query  within  a  given  time  eonstraint.  (A  large  value  of  n  is  better  beeause  it  results  in  a  more  faithful 
approximation  of  P  by  samples  Vi  and  a  smaller  difference  between  and  the  true  expectation  J  fdP.) 
Ideally  the  time  to  evaluate  expectation  Mf  Mp  should  be  directly  proportional  to  n,  but  in  practice,  the 
computation  of  the  same  expression  Ep  may  take  a  different  time,  especially  if  Ep  expresses  a  recursive 
computation.  Therefore  programmers  can  try  different  values  of  n  and  find  the  largest  one  that  finishes  fhe 
expecfafion  query  wifhin  a  given  time  consfrainf. 

A  problem  wifh  fhe  above  definition  is  fhaf  allhough  expectation  is  a  term  construct,  its  reduction  is 
probabilistic  because  of  sampling  sequence  LVi  in  the  rule  Exp.  This  violates  the  principle  that  a  term 
evaluation  is  always  deterministic,  and  now  the  same  term  may  evaluate  to  different  values  if  it  contains 
expectation.  In  order  not  to  violate  the  principle,  we  assume  that  sampling  sequence  uji  in  the  rule  Exp  is 
uniquely  determined  by  expression  Ep. 

Now  we  can  calculate  p Mary. calls]  John.calls  as: 

expectation  (Ax: bool,  if  x  then  1.0  else  0.0)  Q Mary. calls] John.caiis 


3.4.2  Bayes  operation 

The  previous  implementation  of  the  Bayes  operation  P  assumes  a  function  p  and  a  constant  c  such  that 
p{x)  =  kP{x)  <  c  for  a  certain  constant  k.  It  is,  however,  often  difficult  to  find  the  optimal  value  of  c  {i.e., 
the  maximum  value  of  p{x))  and  we  have  to  take  a  conservative  estimate  of  c.  The  Monte  Carlo  method, 
in  conjunction  with  importance  sampling  [40],  allows  us  to  dispense  with  c  by  approximating  Q  with  a  set 
of  samples  and  P  Q  with  a  set  of  weighted  samples.  We  introduce  a  term  construct  bayes  for  the  Bayes 
operation  and  an  expression  construct  importance  for  importance  sampling: 

term  M  :;=  •  •  •  |  bayes  Mp  Mq 

expression  E  ::=  •  •  •  |  importance  {{Vi,Wi)\l  <  f  <  n} 


In  the  spirit  of  data  abstraction,  importance  represents  only  an  internal  data  structure  and  is  not  directly 
available  to  programmers. 


T  Iq,  Mp  :  A — >  real  T  Iq,  Mq  :  OA 
r  hp  bayes  Mp  Mq  :  OA 


Bayes 


T  f-p  Vi  :  A  r  hp  rcj  :  rea  I  1  <  i  <  n 
r  hp  importance  {{Vi,  rcj)|l  <  i  <  n}  T  A 


Mp  p  Mq  prob  Eq 

for  i  =  1,  •  •  •  ,n  new  sampling  sequence  uJi  Eq  @  oji  V)  @  cu'  pVi  Wi 
bayes  Mp  Mq  i-^t  prob  importance  {{Vi,Wi)\l  <  i  <  n} 


Bayes 


<  r  <  where  S  =  Wi 

- * - * - tPEzl -  j^p 

importance  {(Vj, rci)|l  <  i  <  n}  @  ruj  i-^e  14  @  w 


The  rule  Bayes  uses  sampling  sequences  uji,  -  ■  ■  ,ujn  initialized  by  the  run-time  system  and  approximates 
Q  with  n  samples  Vi,  -  ■  ■  ,  14,  where  n  is  a  run-time  variable  as  in  the  rule  Exp.  Then  it  applies  p  to  each 
sample  l/j  to  calculates  its  weight  Wi  and  creates  a  set  {(Cj,  r(;j)|l  <  i  <  n}  of  weighted  samples  as  an 
argument  to  importance.  The  rule  Imp  implements  importance  sampling:  we  use  a  random  number  r  to 
probabilistically  select  a  sample  14  by  taking  into  account  the  weights  associated  with  all  the  samples.  As 
with  expectation,  we  decide  to  define  bayes  as  a  term  construct  with  the  assumption  that  sampling  sequence 
uji  in  the  rule  Bayes  is  uniquely  determined  by  expression  Eq. 
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3.4.3  expectation  and  bayes  as  expression  constructs 

Since  their  reduction  involves  sampling  sequences,  expectation  and  bayes  could  be  defined  as  expression 
constructs  so  that  the  assumption  on  sampling  sequence  ooi  (in  the  rules  Exp  and  Bayes)  would  be  unneces¬ 
sary.  Still  we  choose  to  define  expectation  and  bayes  as  term  constructs  for  pragmatic  reasons.  Consider  a 
probability  distribution  P{s)  defined  in  terms  of  probability  distributions  (3(s)  and  R{u): 

P{s)  =  riQ{s)  f  f{s,  u)R{u)du 

(A  similar  example  is  found  in  Section  5.3.)  P(s)  is  obtained  by  the  Bayes  operation  between  Q{s)  and 
Prob{s)  =  f  f{s,  u)R{u)du,  and  is  encoded  in  FTP  as 

bayes  (As:_.  expectation  {\u:  -  Mf{s,u))  Mq)  Mp 

where  Mp  and  Mq  are  probability  terms  denoting  P  and  Q,  respectively,  and  Mf  is  a  lambda  abstrac¬ 
tion  denoting  /.  If  expectation  was  an  expression  construct,  however,  it  would  be  difficult  to  encode  P{s) 
because  expression  expectation  {Xu:  -  Mf{s,u))  Mq  cannot  be  converted  into  a  term.  In  essence,  math¬ 
ematically  the  expectation  of  a  function  with  respect  to  a  probability  distribution  and  the  result  of  a  Bayes 
operation  are  always  unique  (if  they  exist),  which  in  turn  implies  that  if  expectation  and  bayes  are  defined 
as  expression  constructs,  we  cannot  write  code  involving  expectations  and  Bayes  operations  in  the  same 
manner  that  we  reason  mathematically. 

The  actual  implementation  of  FTP  (to  be  presented  in  the  next  chapter)  does  not  enforce  the  assumption 
on  sampling  sequence  tOi  in  the  rules  Exp  and  Bayes,  which  is  unrealistic  in  practice  and  required  only 
for  the  semantic  clarity  of  FTP.  Strictly  speaking,  therefore,  term  evaluations  are  not  necessarily  deter¬ 
ministic  and  there  is  no  clear  separation  between  terms  and  expressions  in  this  regard.  Since  terms  are  not 
protected  from  computational  effects  (such  as  input/output  and  mutable  references)  and  term  evaluations 
do  not  always  result  in  unique  values  anyway,  non-deterministic  term  evaluations  should  not  be  regarded 
as  a  new  problem.  Thus  expressions  are  best  interpreted  as  a  syntactic  category  dedicated  to  probabilistic 
computations  only  in  the  mathematical  sense  —  strict  adherence  at  the  implementation  level  to  the  semantic 
distinction  between  terms  and  expressions  {e.g.,  defining  expectation  and  bayes  as  expression  constructs) 
would  cost  code  readability  without  any  apparent  benefit. 


3.4.4  Cost  of  generating  random  numbers 

The  essence  of  the  Monte  Carlo  method  is  to  trade  accuracy  for  cost  —  it  only  gives  approximate  answers, 
but  relieves  programmers  of  the  cost  of  exact  computation  (which  can  be  even  impossible  in  certain  prob¬ 
lems).  Since  FTP  relies  on  the  Monte  Carlo  method  to  reason  about  probability  distributions,  it  is  important 
for  programmers  to  be  able  to  determine  the  cost  of  the  Monte  Carlo  method. 

We  decide  to  define  the  cost  of  the  Monte  Carlo  method  as  proportional  to  the  number  of  random  num¬ 
bers  consumed.  The  decision  is  based  upon  the  assumption  that  random  number  generation  can  account  for 
a  significant  portion  of  the  total  computation  time.  (If  the  cost  of  random  number  generation  was  negligible, 
the  number  of  random  numbers  consumed  would  be  of  little  importance.)  Under  our  implementation  of  FTP, 
random  number  generation  for  the  following  examples  from  Section  3.2  accounts  for  an  average  of  74.85% 
of  the  total  computation  time.  The  following  table  shows  execution  times  (in  seconds)  and  percentages  of 
random  number  generation  when  generating  100,000  samples  (on  a  Pentium  III  500Mhz  with  384  MBytes 
memory): 
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test  ease 

exeeution  time 

random  number  generation  (%) 

uniform  0.0  1.0 

0.25 

78.57 

binomial  0.25  16 

4.65 

64.84 

geometric-efix  0.25 

1.21 

63.16 

gaussian-rejection  2.5  5.0 

1.13 

77.78 

exponentiaLvon-NeumanniQ 

1.09 

80.76 

gaussian-Box -Muller  2.0  4.0 

0.57 

77.27 

gaussian-central  0.0  1.0 

2.79 

83.87 

QMary^calls  John^calls 

21.35 

72.57 

In  PTP,  it  is  the  programmers’  responsibility  to  reason  about  the  eost  of  generating  random  numbers, 
sinee  for  an  expression  eomputation  judgment  E  @  uj  @  uj' ,  the  length  of  the  eonsumed  sequenee 
a;  —  cu'  is  not  observable.  A  analysis  similar  to  those  in  Seetion  3.3  ean  be  used  to  estimate  the  eost  of 
obtaining  a  sample  in  terms  of  the  number  of  random  numbers  eonsumed.  In  the  ease  of  geometric -rec  p, 
for  example,  the  expeeted  number  n  of  random  numbers  eonsumed  is  ealeulated  by  solving  the  equation 

n  =  l  +  {l  —  p)*n 

where  1  aeeounts  for  the  random  number  generated  from  the  Bernoulli  distribution  and  (1  —  p)  is  the 
probability  that  another  attempt  is  made  to  generate  a  sample  from  the  same  probability  distribution.  The 
same  teehnique  applies  equally  to  the  rejeetion  method  (e.g.,  gaussiau-rejection). 

3.5  Summary 

Although  eoneeptually  simple,  the  idea  of  using  sampling  funetions  in  speeifying  probability  distributions 
is  new  in  the  history  of  probabilistie  languages.  PTP  is  an  example  of  probabilistie  language  that  indireetly 
expresses  sampling  funetions  in  a  monadie  syntax.  We  eould  also  ehoose  a  different  syntax  for  expressing 
sampling  funetions.  For  example,  the  author  [53]  extends  the  lambda  ealeulus  with  a  sampling  construct  7.e 
to  direetly  eneodes  sampling  funetions  (7  is  a  formal  argument  and  e  denotes  the  body  of  a  sampling  fune- 
tion).  The  eomputation  of  7.6  proeeeds  by  generating  a  random  number  from  [7(0.0, 1.0]  and  substituting  it 
for  7  in  e.  Compared  with  PTP,  the  resultant  ealeulus  faeilitates  the  eneoding  of  some  probability  distribu¬ 
tion  (e.g.,  7.7  for  [7(0.0, 1.0]),  but  it  also  reduees  eode  readability  beeause  every  program  fragment  denotes 
a  probability  distribution  and  there  is  no  separation  between  regular  values  and  probabilistie  eomputations. 

The  idea  of  using  a  monadie  syntax  for  PTP  was  inspired  by  the  stoehastie  lambda  ealeulus  of  Ramsey 
and  Pfeffer  [64],  whose  denotational  semanties  is  based  upon  the  monad  of  probability  measures,  or  the 
probability  monad  [22].  In  implementing  a  query  for  generating  samples  from  probability  distributions, 
they  note  that  the  probability  monad  ean  also  be  interpreted  in  terms  of  sampling  funetions,  both  denota- 
tionally  and  operationally.  In  designing  PTP,  we  take  the  opposite  approaeh:  first  we  use  a  monadie  syntax 
for  probabilistie  eomputations  and  relate  it  direetly  to  sampling  funetions;  then  we  interpret  it  in  terms  of 
probability  distributions. 

The  operational  semanties  of  PTP  ean  be  presented  in  different  styles.  For  example,  expression  eompu- 
tations  eould  use  a  judgment  of  the  form  E  —r"V,  meaning  that  expression  E  eomputes  to  sample  V  by 
eonsuming  a  finite  sequenee  of  random  numbers  ri,  r2,  •  •  • ,  r^.  Although  the  new  judgment  better  refleets 
the  aetual  implementation  of  expression  eomputation,  we  stiek  to  the  formulation  given  in  this  ehapter  to 
emphasize  the  logieal  foundation  of  PTP. 
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Chapter  4 

Implementation 


This  chapter  descrihes  the  implementation  of  FTP.  Instead  of  implementing  FTP  as  a  complete  programming 
language  of  its  own,  we  choose  to  emhed  it  in  an  existing  functional  language  for  two  pragmatic  reasons. 
First  the  conceptual  basis  of  prohahilistic  computations  in  FTP  is  simple  enough  that  it  is  easy  to  simulate  all 
language  constructs  of  FTP  without  any  modification  to  the  run-time  system.  Second  we  intend  to  use  FTP 
for  real  applications  in  robotics,  for  which  we  wish  to  exploit  advanced  features  such  as  a  module  system, 
an  interface  to  foreign  languages,  and  a  graphics  library.  Hence  building  a  complete  compiler  for  FTP  is  not 
justified  when  exfending  an  exisfing  functional  language  is  sufficienl  for  examining  fhe  pracficalify  of  FTP. 

We  emphasize  fhaf  embedding  FTP  in  an  exisfing  funcfional  language  is  differenl  from  adding  a  library 
fo  fhe  hosf  language.  For  example,  fhe  synfax  of  fhe  hosf  language  is  exfended  wifh  fhe  synfax  of  FTP, 
which  is  nol  fhe  case  when  a  library  is  added.  Since  fhe  type  sysfem  of  FTP  is  also  faifhfully  reflected  in  fhe 
hosf  language,  programmers  can  benefif  from  fhe  type  system  of  FTP  even  when  programming  in  fhe  hosf 
language  environmenf.  (A  library  can  also  parfially  reflecf  fhe  fype  sysfem  of  FTP  fhrough  fype  absfracfion, 
buf  nof  complefely  because  of  differenf  synfax  in  fhe  library.) 

In  our  implemenfafion,  we  use  Objecfive  CAML  [2]  as  fhe  hosf  language.  Firsf  we  formulate  a  sound 
and  complete  franslafion  of  FTP  in  a  simple  call-by-value  language  which  can  be  fhoughf  of  a  sublanguage 
of  Objective  CAML.  Then  we  extend  fhe  synfax  of  Objecfive  CAML  using  CAMLP4,  a  preprocessor  for 
Objecfive  CAML,  fo  incorporafe  fhe  synfax  of  FTP.  The  exfended  synfax  is  franslafed  back  in  fhe  original 
synfax. 


4.1  Representation  of  sampling  functions 

Since  a  probabilify  term  denofes  a  probabilify  disfribufion  specified  by  a  sampling  funclion,  fhe  imple- 
menfafion  of  FTP  franslafes  probabilify  terms  info  represenfafions  of  sampling  functions.  We  franslafe  a 
probabilify  term  of  fype  OA  info  a  value  of  type  A  prob,  where  fhe  fype  consfrucfor  prob  is  concepfually 
defined  as  follows: 

type  A  prob  =  real°°  — >  A  *  real°° 

real  is  fhe  type  of  real  numbers,  and  we  use  real°°  for  fhe  fype  of  infinife  sequences  of  random  numbers. 

We  simplify  fhe  definifion  of  prob  in  fwo  sfeps.  Firsf  we  implemenf  real  numbers  of  fype  real  as 
floating  poinf  numbers  of  fype  float  (as  in  Objecfive  CAML).  Second  we  dispense  wifh  infinife  sequences 
of  random  numbers  by  using  a  global  random  number  generator  whenever  fresh  random  numbers  are  needed 
to  compufe  sampling  expressions.  Thus  we  use  fhe  following  definifion  of  prob: 

type  A  prob  =  unit  — >  A 


69 


70 


type 

A,B 

term 

M,N 

expression 

E,F 

value/sample 

V 

floating  point  number 

r 

sampling  sequence 

to 

typing  context 

r 

::=  A^A\  OA  \  real 
::=  x\\x:A.M\MM\ 
::=  M  I  sample  X  from  M 
efix  x^A.  E 

::=  \x:  A.  M  \  proh  E  \  r 
::=  rir2---ri---  where 

::=  •  I  r,  X  :  yl  I  r,  X  ^ 


prob  E  I  r 
in  £'  I  5  I  X 


n  G  (0.0, 1.0] 


Figure  4.1:  A  fragment  of  FTP  as  the  source  language. 


Here  unit  is  the  unit  type  whieh  is  inhabited  only  by  a  unit  value  (). 

The  use  of  type  float  instead  of  type  real  means  that  we  use  finite  preeision  in  representing  sampling 
funetions.  Although  the  overhead  of  exaet  real  arithmetie  is  not  justified  in  fhose  applieafions  {e.g.,  robofies) 
where  we  work  wifh  samples  and  approximafions,  programmers  may  demand  higher  preeision  fhan  is  sup¬ 
ported  by  type  float.  As  a  eonfrived  example,  eonsider  a  binary  disfribufion  assigning  probability  0.25  fo 
True  and  probability  0.75  fo  False: 


prob  sample  X  from  prob  5  in 

2.0  *  X  <  0.5 

If  type  float  uses  only  one  bif  in  mantissa  parf  (and  S  eompufes  fo  eifher  0.5  or  1.0),  fhe  above  probabilify 
term  denofes  a  wrong  probability  disfribufion  (namely  a  poinf-mass  disfribufion  eenfered  on  False);  only 
wifh  fwo  or  more  bifs  in  fhe  mantissa  parf,  if  denotes  fhe  infended  probability  disfribufion.  Therefore,  while 
fhe  finite  preeision  supporfed  by  fhe  implemenfafion  of  FTP  (64  bifs  floafing  poinf  numbers  in  Objeefive 
CAML)  is  adequate  for  fypieal  applieafions,  if  should  also  be  noted  fhaf  fhere  ean  be  sampling  funelions 
demanding  higher  preeision  and  fhaf  errors  indueed  by  floafing  poinf  numbers  ean  be  problemafie  in  some 
applieafions. 

We  use  fhe  fype  eonsfruefor  prob  as  an  absfraef  dafafype.  Thaf  is,  fhe  definition  of  prob  is  nof  exposed  fo 
PTP  and  values  of  type  A  prob  are  aeeessed  only  via  member  funetions.  We  provide  fwo  member  funelions: 
prb  and  app.  prb  builds  a  value  of  fype  A  prob  from  a  funelion  of  type  unit  — >  A;  if  is  aelually  defined 
as  an  idenlily  funelion.  app  generales  a  sample  from  a  value  of  fype  A  prob;  if  applies  ils  argumenl  fo  a 
unil  value.  The  inlerfaee  and  implemenfafion  of  fhe  absfraef  dafafype  prob  are  given  as  follows: 

type  A  prob  type  A  prob  =  unit  — >  A 

val  prb  :  (unit  — >  A)  — >  A  prob  let  prb  =  fun  /:unit  ->  A.f 
val  app  :  A  prob  — >  A  let  app  =  fun  f  :A  prob.  /  () 

We  use  prb  in  Iranslaling  probabilify  terms  and  app  in  Iranslaling  bind  expressions.  In  eonjunelion  wifh 
fhe  use  of  fhe  fype  eonsfruefor  prob  as  an  absfraef  dala  type,  Ihey  provide  a  sound  and  eomplele  Iranslalion 
of  PTP,  as  shown  in  fhe  nexl  seelion. 

4.2  Translation  of  PTP  in  a  call-by-value  language 

We  Iranslale  a  fragmenl  of  PTP  shown  in  Figure  4.1  in  a  eall-by-value  language  shown  in  Figure  4.2.  The 
souree  language  exeludes  produel  types,  whieh  are  slraighlforward  fo  Iranslale  if  fhe  largel  language  is 
extended  wifh  produel  lypes.  We  direefly  Iranslale  expression  fixed  poinf  eonslruels  wilhoul  simulating 
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type 

A,B 

:;=  A  — >  A  A  prob  float  unit 

expression 

e,f 

::=  X  fun  x:  A.  e  e  e  prb  e  app  e  r 

0  random  f  ix  x :  A.  u 

value 

V 

:;=  fun  x:  A.  e  prb  X  r  () 

function 

u 

:;=  fun  x:  A.  e 

floating  point  number 

r 

sampling  sequence 

(jj 

:;=  rir2  ■  ■  -  ri  -  ■  ■  where  r*  G  (0.0, 1.0] 

typing  context 

F 

::=  •  F,  X  :  A 

Figure  4.2:  A  call-by-value  language  as  the  target  language. 

T ,  X  ■.  A\-^  e  :  B 


T ■.  A\-^  X  ■.  A 


Hyp 


r  hv  fun  x:A.e:A  — >  B 


Lam 


r  hv  ei  :  yl  — >  B  F  hv  62  :  A  ,  F  e  :  unit  — >  A 

-  App  - ; - 3 - ^  Prb 


T  \-^  ei  62  :  B 
F  hv  e  :  ^  prob 


F  hv  app  e  :  A 


Papp 


F  hv  prb  e  :  A  prob 


Float 


F  hv  r  :  float  ^  p  |_^  q  . 

T,x  :  A\-^  u  :  A 


Unit 


F  hv  random  :  float 


Random 


F  hv  f  ix  x:yl.  u  :  ^ 


Fix 


Figure  4.3:  Typing  rules  of  the  target  language. 


them  with  fixed  point  eonstruets  for  terms.  As  the  target  language  supports  only  floating  point  numbers,  r 
in  the  souree  language  is  restrieted  to  floating  point  numbers. 

The  target  language  is  a  eall-by-value  language  extended  with  the  abstraet  datatype  prob.  It  has  a  single 
syntaetie  eategory  eonsisting  of  expressions  (beeause  it  does  not  distinguish  between  effeet-free  evalua¬ 
tions  and  effeetful  eomputations).  As  in  PTP,  every  expression  denotes  a  probabilistie  eomputation  and  we 
say  that  an  expression  eomputes  to  a  value.  Note  that  fixed  point  eonstruets  fix  x:  A.  u  allow  reeursive 
expressions  only  over  funetion  types. 

The  type  system  of  the  target  language  is  shown  in  Figure  4.3.  It  employs  a  typing  judgment  F  hv  e  :  A, 
meaning  that  expression  e  has  type  A  under  typing  eontext  F.  The  rules  Prb  and  Papp  eonform  to  the 
interfaee  of  the  abstraet  datatype  prob. 

The  operational  semanties  of  the  target  language  is  shown  in  Figure  4.4.  It  employs  an  expression 
reduetion  judgment  e  @  to  i-^v  e'  @  lo',  meaning  that  the  eomputation  of  e  with  sampling  sequenee  lo 
reduees  to  the  eomputation  of  e'  with  sampling  sequenee  uo'.  A  eapture-avoiding  substitution  [e/x]f  is 
defined  in  a  standard  way.  The  rule  EAppPrb  is  defined  aeeording  to  the  implementation  of  the  abstraet 
datatype  prob.  The  rule  ERandom  shows  that  random,  like  sampling  expressions  in  PTP,  eonsumes  a  random 
number  in  a  given  sampling  sequenee.  We  write  1-^*  for  the  reflexive  and  transitive  elosure  of  i-^v- 

Figure  4.5  shows  the  translation  of  the  souree  language  in  the  target  language,  t  We  overload  the  funetion 
[•]v  for  types,  typing  eontexts,  terms,  and  expressions.  Both  terms  and  expressions  of  type  A  in  the  souree 
language  are  translated  into  expressions  of  type  [A]v  in  the  target  language,  [prob  E]^  suspends  the  eom¬ 
putation  of  [FJjv  by  building  a  funetion  fun  _:unit.  [i7]v,  just  as  prob  E  suspends  the  eomputation  of  E. 
Sinee  the  target  language  allows  reeursive  expressions  only  over  funetion  types,  an  expression  variable  x  of 
type  A  {i.e.,  x  -F  A)  is  translated  into  Xx  ()  where  Xx  is  a  speeial  variable  of  type  unit  — >  [A]v  annotated 

is  a  wildcard  pattern  for  variables  and  types. 
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e  O  w 


e'  @  uj' 


e  f  @  uj  i-^v  e'  f  @  oj'  (fun  x :  A.  e)  f  @  u  i-^v  (fun  x :  A.  e)  f  @  uj' 

^  e@  UJ 


f  @  UJ  i-^v  f  @ 


-fh 


e'  @uj' 


(funx:A.  e)  V  @  UJ  i-^v  [v/x\e  @  uj  prb  e  @  uj  i-^v  pub  e'  @  uj 

e  @  UJ 


/  ^Prb 


e'  @  uj' 


app  e  ly)  w  i-^v  upp  e  y)  w 

^Random 


,  Eapp 


app  prb  V  @  UJ  i-^v  u  ()  @  w 


-■AppPrb 


random  @  rw  i-^v  @  u)  fjLyix:A.u@uj^^[fjLyix:A.u/x\u@uj 

Figure  4.4:  Operational  semantics  of  the  target  language. 


-■Fix 


[A^B], 

=  [yl]v  ->  [B], 

[OA], 

=  [il]v  prob 

[real]v 

=  float 

[•]v 

[r,  X  :  A]^ 

> 

> 

II 

> 

•1- 

><! 

=  [r]v,Xx  :  unit  ->  [il]v 

[x]v 

=  X 

[Ax:yl.  M]v 

=  fun  x:  [il]v.  [M]v 

[M  N]^ 

=  [M]ANh 

[prob  E]^ 

=  prb  (fun  _:  unit.  [£^]v) 

Mv 

=  r 

sample  x  from  M  in  E]^ 

=  (fun  X :  _.  [p;]v)  (app  [M]^) 

[5]v 

=  random 

Mv 

=  SJx  0 

[efix  x-mI.  E]^ 

=  (f  ix  Xx : unit —>  [il]v.  fun  _ 

Figure  4.5:  Translation  of  the  source  language. 


with  x;  if  the  target  language  allowed  reeursive  expressions  over  any  type,  x  and  efix  x-^A.  E  eould  be 
translated  into  Xx  and  fix  Xx :  [^]v  [-E]v,  respeetively.^ 

Propositions  4.1  and  4.2  show  that  the  translation  is  faithful  to  the  type  system  of  the  souree  language. 
Proposition  4.1  proves  the  soundness  of  the  translation:  a  well-typed  term  or  expression  in  the  souree  lan¬ 
guage  is  translated  into  a  well-typed  expression  in  the  target  language.  Proposition  4.2  proves  the  eom- 
pleteness  of  the  translation:  only  a  well-typed  term  or  expression  in  the  souree  language  is  translated  into  a 
well-type  expression  in  the  target  language. 

Proposition  4.1. 

IfT  hpM:A,  then  [P],  K  [M],  :  [2l],. 

IfT  \-pE  ^  A  then  [r]v  K  [-E]v  :  [^]v- 

Proof.  By  simultaneous  induetion  on  the  strueture  of  M  and  E.  □ 

Proposition  4.2. 

^In  the  Objective  CAML  syntax,  [efix  x-:-4.  i?]v  can  be  rewritten  as  let  rec  Xx  0  =  [15]v  in  Xx  (). 
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^[r]v  l“v  [-^]v  :  A  then  there  exists  B  such  that  A  =  [77]  ^  and  F  hp  M  ■.  B. 

If  [r]v  Fv  [-E]v  :  A,  then  there  exists  B  such  that  A  =  [77]^  and  F  hp  77  77. 

Proof.  By  simultaneous  induction  on  the  structure  of  M  and  E.  The  conclusion  in  the  first  clause  also 
implies  F  hp  M  77.  An  interesting  case  is  when  77  =  x. 

Case  77  =  x: 

[F]v  Fv  [x]v  ;  A  by  assumption 

[F]v  Fv  Xx  0  :  A  because  [x]v  =  () 

Xx  :  unit  — >  A  G  [F]v  by  App  and  Unit 

Since  Xx  is  a  special  variable  annotated  with  expression  variable  x, 

Xx  -G  77  G  F  and  A  =  [77]^  for  some  77. 

A  =  [77]v  and  F  Fp  77  -G  77.  □ 

The  translation  is  also  faithful  to  the  operational  semantics  of  the  source  language.  We  first  show  that  the 
translation  is  sound:  a  term  reduction  in  the  source  language  is  translated  into  a  corresponding  expression 
reduction  which  consumes  no  random  number  (Proposition  4.6);  an  expression  reduction  in  the  source 
language  is  translated  into  a  corresponding  sequence  of  expression  reductions  which  consumes  the  same 
sequence  of  random  numbers  (Proposition  4.7).  Note  that  in  Proposition  4.7,  [E]^  does  not  directly  reduce 
to  [7^]v;  instead  it  reduces  to  an  expression  e  to  which  [E]^  eventually  reduces  without  consuming  random 
numbers. 


Lemma  4.3.  [[M/x]N]^  =  [[M]^/x][N]^  and  [[M/x]E]^  =  [[M]^ / x][E]^. 

Proof  By  simultaneous  induction  on  the  structure  of  N  and  77.  □ 

Lemma  4.4. 

[[efix  x-gA.  77/x]M]v  =  [(fix  Xx:unit  — >  [A]v.  fun  unit.  [£J]v)/xx][M]v. 

[[efix  x-gA.  77/x]F]v  =  [(fix  Xx:unit  — >  [A]v.  fun  _:unit.  [77]v)/xx][7^]v- 

Proof  By  simultaneous  induction  on  the  structure  of  M  and  E.  □ 


Corollary  4.5. 

[[efix  x-gA.  77/x]£']v  =  [(fix  Xx  :unit  — >  [A]v.  fun  _:unit.  [£^]v)/xx]  [77] v. 

Proposition  4.6. 

If  M  N,  then  [M]v  @  tu  [-^jv  @  cc  for  any  sampling  sequence  uj. 


Proof  By  induction  on  the  structure  of  the  derivation  of  M  Af. 
M^,M'  ^ 

mn^.m'n  • 

[M]^  @  (j  [M%  @  tu 
[M  N],  =  [M],  [iV], 

[Mjv  [N]^  @  cu  [M']v  [iV]v  @  cu 
[M'j,  [N],  =  [M'  N], 


Case 


(Ax  :A.M)N  (Ax :  A.  M)  N' 

[A^]v  (0)  w  [N%  @  UJ 

[(Ax  :A.M)N]^  =  (fun  x :  [Aj^.  [M]^)  [A'jv 

(funx:[A]v.  [M]^)  [A7].,,  @  cu  (funx:[A]v.  [M]^)  [A7']^ 

(funx:[A],.  [M],)  [N%  =  [(Ax:A.M)  N% 

Case  {Xx:A.M)V  ^t[V/x]M 


UJ 


by  induction  hypothesis 

by 


by  induction  hypothesis 


by  E/3h 
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[(Ax  ■.A.M)V]^  =  (fun  x :  [A]v.  [M]^)  [V]^ 

(fun  x:  [^]v.  [Af]v)  [17]v  @  w  i-^v  [[^]v/a;][A7]v  @  oj 
[[Vl/x][Ml  =  [[V/x]M], 


by 

by  Lemma  4.3 
□ 


Proposition  4.7. 

If  E  @  uj  i-^e  E  @  Lv',  there  exists  e  such  that  [77]  v  @  w  e@  uj'  and  [F]v  @  co'  e  @  uj’. 


Proof.  By  induction  on  the  structure  of  the  derivation  of  E 
cases. 

E  @  U)  l-^e  E'  @  io' 

Case 


UJ  i-^e  F  @  Lo'.  We  consider  two  interesting 

~  Ej^iYidR  • 


sample  x  from  prob  E  \n  F  @  uj  i-^e  sample  x  from  prob  in  F  @  tu' 

[Fjv  @  cu  1-^*  e  @  cu'  where  [F'jv  @  uj'  e  @  cu'  by  induction  hypothesis 

[sample  x  from  prob  E  in  F]v  =  (fun  x:_.  [F]^)  (app  (prb  (fun  _:unit.  [Fj^))) 

(fun  x:_.  [F]v)  (app  (prb  (fun  _:unit.  [Fj^)))  @  uj 

(fun  x:_.  [F]v)  ((fun  _:unit.  [F]v)  ())  @  uj  by  EAppPrb 

(fun  X :  _.  [F]v)  [Fj^  @  uj  by 

1-^*  (fun  X :  _.  [F]v)  e  @  tu'  by  [F]v  @  uj  e  @  uj' 

[sample  x  from  prob  E'  in  Fjv  =  (fun  x:_.  [F]^)  (app  (prb  (fun  _:unit.  [F'jv))) 

(funx:_.  [F]v)  (app  (prb  (fun  _:unit.  [F'jv)))  @  uj' 

(f  un  X :  _.  [F]v)  [F']^  @  uj'  by  EippPrs  and  E^ 

1-^*  (fun  X :  _.  [F]v)  e  @  tu'  by  [F']v  @  uj'  e  @  uj' 


Case  efix  x-^A.  F  @  w  i-^e  [efix  x-^A.  F/x]F  @  tu  • 

[efix  x-^^.  F]^  =  (fixXx:unit  — >  [^j^.  fun  _:unit.  [Fj^)  () 

(fix  Xx  :unit  — >  [yljv.  fun  _:unit.  [Fj^)  ()  @  tu 

(fun_:unit.  [fixxx:unit  — >  [yljv.  fun  unit.  [F]v/xx][F]v)  ()  @  w  by  Epix 

i-^v  [f  ix  Xx  :unit  — >  [yljv.  fun  unit.  [Fjv/xx][Fjv  @  tu  by  E^v 

[[efix  x-^yl.  F/xjF]v  =  [fixxx:unit  — >  [yljv.  fun_:unit.  [Fjv/xxj[Fjv  by  Corollary  4.5 

□ 


The  completeness  of  the  translation  states  that  only  a  valid  term  or  expression  reduction  in  the  source 
language  is  translated  into  a  corresponding  sequence  of  expression  reductions  in  the  target  language.  In 
other  words,  a  term  or  expression  that  cannot  be  further  reduced  in  the  source  language  is  translated  into 
an  expression  whose  reduction  eventually  gets  stuck.  To  simplify  the  presentation,  we  introduce  three 
judgments,  all  of  which  express  that  a  term  or  expression  does  not  further  reduces. 


•  M  •  means  that  there  exists  no  term  to  which  M  reduces. 

•  E  @  u!  i-^e  •  means  that  there  exists  no  expression  to  which  F  reduces. 

•  e  @  w  i-^v  •  means  that  there  exists  no  expression  to  which  e  reduces  (in  the  target  language). 


Corollary  4.9  proves  the  completeness  of  the  translation  for  terms;  Proposition  4.10  proves  the  com¬ 
pleteness  of  the  translation  for  expressions. 

Proposition  4.8.  If  [Mjv  @  uj  i-^v  e  @  uj',  then  e  =  [Fjv,  uj  =  uj',  and  M  TV. 

Proof  By  induction  on  the  structure  of  M.  We  only  need  to  consider  the  case  M  =  Mi  M2.  There  are 
three  cases  of  the  structure  of  [Mi  M2jv  @  JJ  i-^v  e  @  uj'  (corresponding  to  the  rules  E^^’  and  Ep.^).  The 
case  for  the  rule  E^v  uses  Lemma  4.3.  □ 
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Corollary  4.9,  If  M  •.  then  [M]v  @  to  i-^v  *for  any  sampling  sequence  u>. 
Proposition  4.10,  If  E  @  u!  i-^e  •.  then  there  exists  e  such  that  [E]^  @  oj  e  @  u) 


Proof  By  induction  on  the  structure  of  E.  We  consider  two  cases  E  =  M  and  E  =  sample  x  from  M  in  F; 
the  remaining  cases  are  all  trivial. 

Case  E  =  M,  [E]^  =  [M]^: 

M  I— •  by  ETerm 

[M]v  @  uj  •  by  Corollary  4.9 

We  let  e  =  [M]^. 

Case  E  =  sample  x  from  M  in  F,  [F]v  =  (fun  x:_.  [F]^)  app  [M]^: 

If  M  /  prob  •, 


M  • 

[M]v  @  UJ  i-^v  • 

The  rule  E^pp  does  not  apply  to  [F]v. 

The  rule  EAppPrb  does  not  apply  to  [F]v. 

We  let  e  =  [F]v. 

If  M  =  prob  E',  E'  /  V, 

E'  @  UJ  • 

There  exists  e'  such  that  [E']^  @  uj  e'  @  cu 

[E],  @  UJ 

1-^;  (fun  X :  _.  [F]v)  [E%  @  uj 
1-^*  (fun  X :  _.  [F]v)  e'  @  a; 


by 

by  Corollary  4.9 

[M]v  /  prb  • 


by  EsindR 

by  induction  hypothesis. 

[M]v  =  prb  fun  _:unit.  [E']^ 
[E%  ©uj^le'  @uj 
e'  @  UJ  • 


We  let  e  =  (fun  x :  _.  [F]^)  e'. 

If  M  =  prob  V,  then  E  @  uj  i-^e  •  does  not  hold  because  of  the  rule  EsindV-  D 


The  target  language  can  be  thought  of  as  a  sublanguage  of  Objective  CAML  in  which  the  abstract 
datatype  prob  is  built-in  and  random  is  implemented  as  Random,  float  1 . 0.^  Since  Objective  CAML 
also  serves  as  the  host  language  for  PTP,  we  need  to  extend  the  syntax  of  Objective  CAML  to  incorporate 
the  syntax  of  PTP.  The  extended  syntax  is  then  translated  back  in  the  original  syntax  of  Objective  CAML 
using  the  function  [-Jv.  The  next  section  gives  the  definition  of  the  extended  syntax. 


4.3  Extended  syntax 

We  use  CAMLP4  to  conservatively  extend  the  syntax  of  Objective  CAML,  which  is  assumed  to  be  specified 
by  a  non -terminal  {term)  (corresponding  to  terms  in  PTP),  with  a  new  non -terminal  {expr)  (corresponding 
to  expressions  in  PTP);  {patt)  is  a  non-terminal  for  patterns  and  {id)  for  identifiers: 


{term) 

::=  •  •  •  PROB  {  {expr)  } 

probabilify  ferm 

{ expr) 

::=  [{term)] 

term  as  an  expr. 

sample  {patt)  from  {term)  in  {expr) 

bind  expr. 

UNIFORM 

sampling  expr. 

efix  {id)  ->  {expr) 

expr.  fixed.p.c. 

%{id) 

expr.  variable 

unprob  {term) 

unprob 

eif  {term)  then  {expr)  else  {expr) 

eif 

^To  be  strict,  random  would  be  implemented  as  1 . 0  -  .  Random .  float  1 . 0. 
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[  {term)  ]  explicitly  marks  a  term  as  an  instance  of  expression.  #(irf)  refers  to  an  expression  variable  {id). 
All  other  expression  constructs  resemble  their  counterparts  in  Chapter  3. 

As  an  example,  we  encode  a  Bernoulli  distribution  over  type  bool  as  follows: 

let  bernoulli  =  fun  p  -> 

PROS  {  sample  x  from  PROS  {  UNIFORM  }  in 
[if  X  <=  p  then  true  else  false]  } 

A  geometric  distribution  is  encoded  with  an  expression  fixed  point  construct  as  follows: 

let  geometric  =  fun  p  -> 

let  bernoulli_p  =  bernoulli  p  in 
PROS  { 

efix  geo  -> 

sample  b  from  bernoulli_p  in 

eif  b  then  [0] 

else 

sample  x  from  PROS  {  #geo  }  in 
[1  +  X] 

} 

All  other  examples  in  Section  3.2  can  be  encoded  in  a  similar  way. 

4.4  Approximate  computation 

In  FTP,  reasoning  about  a  probability  distribution  is  accomplished  by  generating  multiple  samples  and 
analyzing  them.  The  implementation  of  FTP  provides  two  functions  for  generating  independent  samples 
from  a  given  probability  distribution: 

type  ' a  set 
type  ' a  wset 

val  prob_to_set  :  ' a  prob  ->  ' a  set 

val  prob_to_wset  :  'a  prob  ->  {'a  ->  float)  ->  'a  wset 

•  'a  set  is  a  datatype  for  sets  of  samples  of  type  '  a. 

•  'a  wset  is  a  datatype  for  sets  of  weighted  samples  of  type  '  a.  Each  sample  is  assigned  a  weight  of 
type  float  and  '  a  wset  may  be  thought  of  as  {' a  *  float)  set.  All  weights  are  normalized 
(i.e.,  their  sum  is  1.0). 

•  prob_to_set  p  generates  samples  from  p  by  evaluating  app  p  repeatedly. 

•  prob_to_wset  p  f  generates  samples  from  p  and  assigns  to  each  sample  V  a  weight  of  f  V. 

Frogrammers  can  specify  the  number  of  samples  generated  from  prob_to_set  and  prob_to_wset, 
thereby  controlling  the  accuracy  in  approximating  probability  distributions. 

The  implementation  of  FTF  provides  two  functions  for  applying  the  Monte  Carlo  method: 

val  set_monte_carlo  :  'a  set  ->  {'a  ->  float)  ->  float 
val  wset_monte_carlo  :  'a  wset  ->  {'a  ->  float)  ->  float 
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ws 


wset_to_prob_truncate  ws 


Figure  4.6:  wset_to_prob_truncate. 
y'  f  V 

•  set  monte  carlo  s  f  returns  — - ■ 

—  —  |s| 

•  wset_monte_carlo  ws  f  returns  F)  •  tu. 

The  following  two  funetions  eonvert  sets  and  weighted  sets  baek  to  probability  distributions: 

val  set_to_prob_re sample  :  'a  set  ->  'a  prob 
val  wset_to_prob_resample  :  ' a  wset  ->  ' a  prob 

•  set_to_prob_re sample  s  returns  a  uniform  distribution  over  s . 

•  wset_to_prob_resample  ws  returns  prob  importance  ws  whieh performs  importanee  sampling 
on  ws  to  seleet  samples. 

Now  the  expeetation  query  (in  Seetion  3.4.1)  and  the  Bayes  operation  (in  Seetion  3.4.2)  are  implemented  by 
eomposing  these  funetions: 

expectation  f  p  =  set_monte_carlo  {prob_to_set  p)  f 
bayes  f  p  =  wset_to_prob_resample  {prob_to_wset  p  f) 

The  implementation  of  FTP  also  provides  a  funetion  for  approximating  the  support  of  a  given  probability 
distribution.  Sinee  the  support  of  an  arbitrary  probability  distribution  eannot  be  ealeulated  aeeurately,  we 
represent  it  as  a  uniform  distribution: 

val  wset_to_prob_truncate  :  'a  wset  ->  'a  prob 

wset_to_prob_truncate  ws  returns  a  uniform  distribution  over  n  samples  of  highest  weights  in  ws, 
where  n  is  the  parameter  speeifying  the  number  of  samples  generated  by  prob_to_set  and  prob_to_wset. 
Figure  4.6  illustrates  how  wset_to_prob_truncate  works.  ws  has  five  samples  in  it,  and 
wset_to_prob_truncate  is  invoked  when  the  parameter  n  is  set  to  three.  The  two  samples  with 
lowest  weights  perish,  and  all  the  surviving  samples  are  assigned  the  same  weight. 

wset_to_prob_truncate  is  useful  partieularly  when  we  want  to  extraet  a  small  number  of  sam¬ 
ples  of  high  weights  from  a  probability  distribution.  For  (an  approximation  of)  the  uniform  distribution  over 
the  support  of  p,  we  use  wset_to_prob_truncate  {prob_to_wset  p  (fun  _  ->  1.0)  ), 
where  (fun  _  ->  1.0)  is  a  eonstant  Objeetive  C AML  funetion  returning  1 . 0. 
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Figure  4.7:  Horizontal  and  vertical  computations. 


4.5  Simultaneous  computation  of  multiple  samples 

The  implementation  of  FTP  uses  a  simple  strategy  to  generate  multiple  samples  from  a  given  prohahil- 
ity  distrihution:  compute  the  same  expression  repeatedly.  An  alternative  strategy  is  to  perform  a  single 
parallel  computation  that  simulates  multiple  independent  computations.  To  distinguish  the  two  kinds  of 
computations,  we  refer  to  the  former  strategy  as  vertical  computations  and  the  latter  strategy  as  a  horizontal 
computation,  as  shown  in  Figure  4.7. 

A  horizontal  computation  can  he  potentially  faster  than  an  equivalent  number  of  vertical  computations. 
For  example,  a  horizontal  computation  of  sample  x  from  M  in  E  avoids  the  overhead  of  evaluating  the  same 
term  M  more  than  once;  thus  the  advantage  of  a  horizontal  computation  becomes  pronounced  if  M  takes  a 
long  time  to  evaluate.  The  cost  associated  with  each  language  construct  also  remains  constant  in  a  horizontal 
computation.  For  example,  a  horizontal  computation  of  sample  x  from  M  \n  E  performs  a  substitution  for 
X  only  once,  but  vertical  computations  perform  as  many  substitutions  for  x. 

To  examine  the  potential  benefit  of  horizontal  computations,  we  implement  a  translator  of  FTP  for  hori¬ 
zontal  computations.  Conceptually  an  expression  now  computes  to  an  ordered  set  of  samples  in  such  a  way 
that  each  sample  corresponds  to  the  result  of  an  independent  vertical  computation  of  the  same  expression. 
We  may  think  of  the  translator  as  implementing  an  operational  semantics  based  upon  the  judgment 


E@[coi,---  ,14}  ,a;;] 

which  means  E  @  tOi  ^  Vi  @  co^  for  1  <  i  <  n. 

The  translator  is  implemented  in  a  similar  way  to  the  operational  semantics  for  vertical  computations: 
the  syntax  of  Objective  CAML  is  extended  using  CAMLP4,  and  terms  and  expressions  of  the  extended 
syntax  are  translated  back  in  Objective  CAML.  The  definition  of  the  type  constructor  prob,  however,  is 
more  complex  because  of  conditional  constructs  (if  •  then  •  else  •  and  eif  •  then  •  else  •)■  To  motivate  our 
definition  of  prob,  consider  the  following  expression: 

sample  x  from  prob  S  in 
sample  y  from  prob  E  in 
eif  X  <  0.5  then  Ei  else  E2 

A  vertical  computation  reduces  the  whole  expression  to  either  Ei  or  E2  and  needs  to  keep  only  one  reduced 
expression.  A  horizontal  computation,  however,  may  have  to  keep  both  Ei  and  E2  because  multiple  samples 
are  generated  from  U (0.0, 1.0]  for  variable  x.  For  example,  if  an  ordered  set  {0.1, 0.6, 0.3, 0.9}  is  generated 
for  variable  x,  the  horizontal  computation  reduces  to  two  smaller  horizontal  computations:  one  of  Ei  with 
X  bound  to  {0.1,  —,0.3,  — }  and  another  of  E2  with  x  bound  to  {—,0.6,  —,0.9}.  Note  that  we  may  not 
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compress  {0.1,  —,0.3,  — }  to  {0.1, 0.3}  and  {  —  ,0.6,  —,0.9}  to  {0.6, 0.9}  because  the  ordered  set  to  which 
variable  y  is  bound  may  be  correlated  to  variable  x. 

Thus  we  are  led  to  define  the  type  constructor  prob  using  bit  vectors  and  ordered  sets: 

type  bflag 
type  'a  oset 

type  'a  prob  =  bflag  ->  'a  oset 


•  bflag  is  the  type  of  bit  vectors  of  fixed  size. 

•  'a  oset  is  a  datatype  for  ordered  sets  of  element  type  '  a.  An  ordered  set  of  element  type  '  a  may 
contain  not  only  ordinary  values  of  type  '  a  but  also  null  values  (‘— ’  in  the  above  example).  Ordinary 
values  correspond  to  values  of  1  and  null  values  to  values  of  0  in  bit  vectors. 

•  '  a  prob  is  a  datatype  for  both  probability  distributions  over  type  '  a  and  expressions  of  type  '  a. 
It  is  defined  as  the  type  of  a  function  that  takes  a  bit  vector,  performs  a  horizontal  computation  for 
values  of  1  in  the  given  bit  vector,  and  returns  the  resultant  ordered  set. 


Since  variables  from  bind  expressions  are  always  bound  to  ordered  sets,  we  distinguish  between  terms 
manipulating  ordinary  values  and  terms  manipulating  ordered  sets.  The  new  syntax,  further  augmenting  the 
extended  syntax  in  Section  4.3,  introduces  a  non -terminal  {pterm)  for  those  terms  manipulating  ordered 
sets;  the  definition  of  the  non -terminal  {expr)  uses  {pterm)  in  place  of  {term): 


{term) 

{pterm) 


•  •  •  I  {pterm) 

lam  {patt)  ->  {pterm)  \ 
app  {pterm)  to  {pterm)  \ 
pif  {pterm)  then  {pterm) 
else  {pterm)  \ 

@(id)  I 

const  {term)  \ 

ptrue I pfalse | @+ | CMP  <=. 


lambda  abstraction 
application  term 

cond.  term  construct 
variable 
constants 
built-in  constants 


In  the  new  syntax,  a  Bernoulli  distribution  and  a  geometric  distribution  are  encoded  as  follows: 


let  bernoulli  =  fun  p  -> 

PROB  {  sample  x  from  PROB  {  UNIFORM  }  in 

[pif  @x  CMP  <= .  const  p  then  ptrue  else  pfalse]  } 


let  geometric  =  fun  p  -> 

let  bernoulli_p  =  bernoulli_prob  p  in 
PROB  { 

efix  geo  -> 

sample  b  from  bernoulli_p  in 

eif  @b  then  [const  0] 

else 

sample  x  from  PROB  {  #geo  }  in 
[const  1  @+  @x] 

} 


Compared  with  the  examples  in  Section  4.3,  the  code  is  the  same  except  that  all  terms  within  expressions 
manipulate  ordered  sets  rather  than  ordinary  values. 


80 


test  case 

vertical 

horizontal 

overhead  (%) 

bernoulli  0.25 

0.922 

1.188 

28.85 

uniform  0.0  1.0 

0.906 

1.078 

18.98 

binomial  0.25  16 

16.563 

23.187 

39.99 

geometric-efix  0.25 

3.937 

7.157 

81.78 

gaussian-rejection  2.5  5.0 

4.688 

7.593 

61.96 

exponential -Von-NeumanniQ 

4.031 

6.922 

71.71 

gaussian_Box _Muller  2.0  4.0 

4.796 

5.031 

4.89 

gaussian-central  0.0  1.0 

10.594 

12.157 

14.75 

Q  Mary  _calls\John_calls 

90.063 

138.922 

54.24 

Figure  4.8:  Execution  times  (in  seconds)  for  generating  a  total  of  3,100,000  samples. 


Experimental  results 

We  compare  execution  times  for  generating  the  same  number  of  samples  in  vertical  and  horizontal  com¬ 
putations.  The  type  bflag  uses  31-bit  integers  (of  type  int  in  Objective  CAML),  which  means  that  a 
single  horizontal  computation  performs  up  to  31  independent  vertical  computations;  the  datatype  'a  oset 
uses  arrays  of  31  elements  of  type  '  a.  We  use  an  AMD  Athlon  XP  1.67GHz  with  512MB  memory  for  all 
experiments. 

Figure  4.8  shows  execution  times  for  various  test  cases  from  Chapter  3.  In  all  test  cases,  horizontal 
computations  are  slower  than  vertical  computations,  as  indicated  by  their  overhead  relative  to  vertical  com¬ 
putations.  The  overhead  of  horizontal  computations  is  especially  high  in  those  test  cases  involving  condi¬ 
tional  constructs  (namely,  binomial,  geometric-efix,  gaussian^rejection,  exponentiaLvon-Neumanui  Q, 
and  QMary  caiis\john  calls)-  The  high  Overhead  can  be  attributed  to  the  fact  that  a  horizontal  computation 
allocates  an  array  of  size  3 1  for  every  expression,  regardless  of  the  number  of  ordinary  values  from  it.  For 
example,  even  when  a  horizontal  computation  is  simulating  just  a  single  vertical  computation  (after  en¬ 
countering  several  conditional  constructs),  the  computation  of  an  expression  still  requires  an  array  of  size 
31. 

The  experimental  results  show  that  the  overhead  for  maintaining  ordered  sets  and  handling  conditional 
constructs  exceeds  the  gain  from  simulating  multiple  vertical  computations  with  a  single  horizontal  compu¬ 
tation.  Our  implementation  is  just  a  translator  which  does  not  rely  on  support  from  the  compiler.  In  order 
to  fully  realize  the  potential  of  horizontal  computations,  it  seems  necessary  to  integrate  the  implementa¬ 
tion  within  the  compiler  and  the  run-time  system.  As  a  speculation,  horizontal  computations  can  be  up  to 
twice  faster  than  vertical  computations:  random  number  generation,  which  costs  the  same  in  both  vertical 
and  horizontal  computations,  accounts  for  about  half  the  total  computation  time;  hence,  with  no  overhead 
other  than  random  number  generation,  horizontal  computations  would  be  about  twice  faster  than  vertical 
computations. 


4.6  Summary 

Although  FTP  is  implemented  indirectly  via  a  translation  in  Objective  CAML,  both  its  type  system  and  its 
operational  semantics  are  faithfully  mirrored  through  the  use  of  an  abstract  datatype.  Besides  all  existing 
features  of  Objective  CAML  are  available  when  programming  in  PTP,  and  we  may  think  of  the  implemen¬ 
tation  of  PTP  as  a  conservative  extension  of  Objective  CAML.  The  translation  is  easily  generalized  to  any 
monadic  language,  thus  complementing  the  well-established  result  that  a  call-by-value  language  is  translated 
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in  a  monadic  language  {e.g.,  see  [68]). 

The  translator  of  FTP  does  not  protect  terms  from  computational  effects  already  available  in  Objective 
CAML  such  as  input/output,  mutable  references,  and  even  direct  uses  of  Random,  float.  Thus,  for 
example,  term  M  in  a  bind  expression  sample  x  from  M  in  TJ  is  supposed  to  produce  no  world  effect,  but 
the  translator  has  no  way  to  verify  that  the  evaluation  of  M  is  effect-free.  Therefore  the  translator  of  FTP 
relies  on  programmers  to  ensure  that  every  term  denotes  a  regular  value. 

Since  the  linguistic  framework  for  FTP  is  a  reformulation  of  Moggi’s  monadic  metalanguage  X^i  (see 
Chapter  2),  Haskell  is  also  a  good  choice  as  a  host  language  for  embedding  FTP.  To  embed  FTP  in  Haskell, 
one  would  define  a  Haskell  monad,  say  Prob,  for  probabilistic  choices  and  translate  an  expression  of 
type  A  into  a  program  fragment  of  type  Prob  A,  while  ignoring  the  keyword  prob  in  probability  terms. 
Alternatively  one  could  exploit  the  global  random  number  generator  maintained  by  the  10  monad  and 
translate  OA  of  FTP  into  10  A  of  Haskell.  (Our  choice  of  Objective  CAML  is  due  to  personal  preference.) 

We  could  directly  implement  FTP  by  extending  the  compiler  and  the  run-time  system  of  Objective 
CAML.  An  immediate  benefit  is  that  type  error  messages  are  more  informative  because  type  errors  are 
detected  at  the  level  of  FTP.  (Our  implementation  detects  type  errors  in  the  translated  code  rather  than  in 
the  source  code;  hence  programmers  should  analyze  type  error  messages  to  locate  type  errors  in  the  source 
code.)  As  for  execution  speed,  we  conjecture  that  the  gain  is  negligible,  since  the  only  overhead  incurred 
by  the  abstract  datatype  prob  is  to  invoke  two  tiny  functions  when  its  member  functions  are  invoked:  an 
identity  function  (for  prb)  and  a  function  applying  its  argument  to  a  unit  value  (for  app). 
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Chapter  5 

Applications 


This  chapter  presents  three  applications  of  FTP  in  robotics:  robot  localization,  people  tracking,  and  robotic 
mapping,  all  of  which  are  popular  topics  in  robotics.  Although  different  in  goal,  all  these  applications  share 
a  common  characteristic:  the  state  of  a  robot  is  estimated  from  sensor  readings,  where  the  definition  of  state 
differs  in  each  case.  A  key  element  of  these  applications  is  uncertainty  in  sensor  readings,  due  to  limitations 
of  sensors  and  noise  from  the  environment.  It  makes  the  problem  of  estimating  the  state  of  a  robot  both 
interesting  and  challenging:  if  all  sensor  readings  were  accurate,  the  state  of  a  robot  could  be  accurately 
traced  by  a  simple  (non-probabilistic)  analysis  of  sensor  readings.  In  order  to  cope  with  uncertainty  in 
sensor  readings,  we  estimate  the  state  of  a  robot  with  probability  distributions. 

As  a  computational  framework,  we  use  Bayes  filters.  In  each  case,  we  formulate  the  update  equations 
at  the  level  of  probability  distributions  and  translate  them  in  FTP.  All  implementations  are  tested  using  data 
collected  with  real  robots. 


5.1  Sensor  readings:  action  and  measurement 

To  update  the  state  of  a  robot,  we  use  two  kinds  of  sensor  readings:  action  and  measurement.  As  in  a  Bayes 
filter,  an  action  induces  a  state  change  whereas  a  measurement  gives  information  on  the  state: 

•  An  action  a  is  represented  as  an  odometer  reading  which  returns  the  pose  {i.e.,  position  {x,y)  and 
orientation  0)  of  the  robot  relative  to  its  initial  pose.  It  is  given  as  a  tuple  (Ax,  Ay,  A6). 

•  A  measurement  m  consists  of  range  readings  which  return  distances  to  objects  visible  at  certain  an¬ 
gles.  It  is  given  as  an  array  [di;  •  •  •  ;  dn]  where  each  dj,  1  <  i  <  n,  denotes  the  distance  between  the 
robot  and  the  closest  object  visible  at  a  certain  angle. 

Figure  5.1  shows  a  typical  example  of  measurement.  It  displays  range  readings  produced  by  a  laser  range 
finder  covering  180  degrees.  The  robot  is  shown  in  the  center;  occluded  regions  are  colored  in  grey. 

Odometers  and  range  finders  are  prone  to  errors  because  of  their  mechanical  nature.  An  odometer 
usually  tends  to  drift  in  one  direction  over  time.  Its  accumulated  error  becomes  manifest  especially  when 
the  robot  closes  a  loop  after  taking  a  circular  route.  Range  finders  occasionally  fail  to  recognize  obstacles  and 
report  the  maximum  distance  measurable.  In  order  to  correct  these  errors,  we  use  a  probabilistic  approach 
by  representing  the  state  of  the  robot  with  a  probability  distribution. 

In  the  probabilistic  approach,  an  action  increases  the  set  of  possible  states  of  the  robot  because  it  induces 
a  state  change  probabilistically.  In  contrast,  a  measurement  decreases  the  set  of  possible  states  of  the  robot 
because  it  gives  negative  information  on  unlikely  states  (and  positive  information  on  likely  states).  We  now 
demonstrate  how  to  probabilistically  update  the  state  of  the  robot  in  three  different  applications. 
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Figure  5.1:  Range  readings  produced  by  a  laser  range  finder.  The  robot  faces  a  person  on  its  right,  visible  as  the 
shadows  of  two  legs. 


5.2  Robot  localization 

Robot  localization  [72]  is  the  problem  of  estimating  the  pose  of  a  robot  when  a  map  of  the  environment  is 
available.  If  the  initial  pose  is  given,  the  problem  becomes  pose  tracking  which  keeps  track  of  the  robot 
pose  by  compensating  errors  in  sensor  readings.  If  the  initial  pose  is  not  given,  the  problem  becomes  global 
localization  which  begins  with  multiple  hypotheses  on  the  robot  pose  (and  is  therefore  more  involved  than 
pose  tracking). 

We  consider  robot  localization  under  the  assumption  (called  the  Markov  assumption)  that  the  past  and 
the  future  are  independent  if  the  current  pose  is  known,  or  equivalently  that  the  environment  is  static.  This 
assumption  allows  us  to  use  a  Bayes  filter  in  estimating  the  robot  pose.  Specifically  the  state  in  the  Bayes 
filter  is  the  robot  pose  s  =  {x,y,6),  and  we  estimate  s  with  a  probability  distribution  Bel{s)  over  three- 
dimensional  real  space.  We  compute  Bel{s)  according  to  the  following  update  equations  (which  are  the 
same  as  shown  in  Section  1.1): 

(5.1)  Bel{s)  <—  J A{s\a,  s')Bel{s')ds' 

(5.2)  Bel{s)  ^  r]'P{m\s)Bel{s) 

7]  a  normalizing  constant  ensuring  f  Bel{s)ds  =  1.0.  We  use  the  following  interpretation  of  .A(s|a,  s')  and 

V{m\s)\ 

•  ^(s|a,  s')  is  the  probability  that  the  robot  moves  to  pose  s  after  taking  action  a  at  another  pose  s'.  A 
is  called  an  action  model. 

•  V{m\s)  is  the  probability  that  measurement  m  is  taken  at  pose  s.  V  is  called  a  perception  model. 

Given  an  action  a  and  a  pose  s' ,  a  new  pose  s  can  be  generated  from  the  action  model  .A(-|a,  s')  by 
adding  a  noise  to  a  and  applying  it  to  s'.  In  our  implementation,  .A(-|a,  s')  assumes  constant  translational 
and  rotational  velocities  while  action  a  is  taken  from  pose  s'.  It  also  assumes  that  errors  in  translational  and 
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Figure  5.2:  Samples  from  the  action  model. 


rotational  velocities  obey  Gaussian  distributions.  Figure  5.2  shows  samples  of  the  new  pose  after  taking  a 
curved  trajectory. 

Given  a  measurement  m  and  a  pose  s,  we  can  also  compute  KV{m\s)  where  k  is  an  unknown  constant: 
the  map  determines  a  unique  (accurate)  measurement  rus  for  pose  s,  and  the  squared  Euclidean  distance 
between  m  and  rUs  is  assumed  to  be  proportional  to  V{m\s).  Figures  5.3  and  5.4  illustrate  how  to  compute 
KV{m\s).  Figure  5.3  shows  points  in  the  map  that  correspond  to  measurement  m  when  s  is  set  to  the  true 
pose  of  the  robot,  in  which  case  the  unique  measurement  rUs  for  pose  s  coincides  with  m  (recall  that  a 
measurement  consists  of  not  points  in  the  map  but  range  readings).  Hence  each  point  is  projected  on  the 
contour  of  the  map  and  is  assigned  a  high  likelihood  as  indicated  by  the  dark  color.  Figure  5.4  shows  points 
in  the  map  that  correspond  to  the  same  measurement  m,  but  when  s  is  set  to  a  hypothetical  pose  of  the  robot; 
the  unique  measurement  rUs  for  pose  s  is  represented  by  points  with  crosses.  Since  the  measurement  is  not 
taken  at  the  hypothetical  pose,  no  point  is  correctly  aligned  along  the  contour  of  the  map.  Thus  each  point 
is  assigned  a  relatively  low  likelihood  as  indicated  by  the  grey  color  (the  degree  of  darkness  indicates  its 
likelihood).  We  compute  KV{m\s)  as  the  product  of  all  individual  likelihoods.^ 

Our  implementation  simplifies  the  computation  of  KV{m\s)  by  approximating  with  those  points  on 
the  contour  of  the  map  that  are  closest  to  the  points  corresponding  to  measurement  m;  Figure  5.5  shows  how 
to  approximate  nis  with  those  points  with  crosses.  This  simplification  allows  us  to  precompute  the  likelihood 
of  every  point  in  the  map,  since  its  closest  point  on  the  contour  of  the  map  is  fixed.  Our  implemenfafion  uses 
a  grid  map  af  10  cenfimefer  resolufion  and  generafes  a  likelihood  map  which  stores  fhe  likelihood  of  each 
cell  in  fhe  map;  see  Figures  5.6  for  a  grid  map  and  ifs  likelihood  map. 

Now,  if  M_4  denotes  condifional  probabilify  A  and  Mp  m  refurns  a  funclion  f{s)  =  KP{m\s),  we 
implemenf  update  equafions  (5.1)  and  (5.2)  as  follows: 

let  Belnew  =  prob  sample  s'  from  Bel  in 

sample  s  from  (a,  s')  in 
s 

let  Belnew  =  bayes  (Mp  m)  Bel 

Both  pose  tracking  and  global  localization  are  achieved  by  specifying  an  appropriate  initial  probability 
distribution  of  robot  pose.  For  pose  tracking,  we  use  a  point-mass  distribution  or  a  Gaussian  distribution; 

'Our  implementation  filters  out  outlier  range  readings  in  m  before  computing  KP{m\s). 


}  (5.2) 
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Figure  5.3:  Points  in  the  map  that  correspond  to  measurements  when  s  is  set  to  the  true  pose  of  the  robot. 


Figure  5.4:  Points  in  the  map  that  correspond  to  measurements  when  s  is  set  to  a  hypothetical  pose  of  the  robot. 


for  global  localization,  we  use  a  uniform  distribution  over  the  open  space  in  the  map. 

Experimental  results 

To  test  the  robot  localizer,  we  use  a  Nomad  XR4000  mobile  robot  in  Wean  Hall  at  Carnegie  Mellon  Univer¬ 
sity.  The  robot  is  equipped  with  180  laser  range  finders  (one  for  each  degree  so  as  to  cover  180  degrees). 
The  robot  localizer  uses  every  fifth  range  reading,  and  thus  a  measurement  consists  of  a  batch  of  ^  =  36 
range  readings.  We  use  CARMEN  [49]  for  controlling  the  robot  and  collecting  sensor  readings.  The  robot 
localizer  runs  on  a  Pentium  III  500Mhz  with  384  MBytes  memory. 

We  test  the  robot  localizer  for  global  localization.  The  initial  probability  distribution  of  robot  pose  is 
a  uniform  distribution  over  the  open  space  in  the  map,  which  is  approximated  with  100,000  samples.  The 
first  batch  of  range  readings  is  processed  according  to  update  equation  (5.2).  The  resultant  probability 
distribution,  which  is  still  approximated  with  100,000  samples,  is  then  replaced  by  its  support  approximated 
with  500  samples.  The  number  of  samples,  100,000  or  500,  is  chosen  empirically  —  both  too  many  and  too 
few  samples  prevent  the  probability  distribution  from  converging  to  a  correct  pose. 

Figure  5.7  shows  a  probability  distribution  of  robot  pose  after  processing  the  first  batch  of  range  readings 
in  Figure  5.1;  pluses  represent  samples  generated  from  the  probability  distribution.  The  robot  starts  right 
below  character  A,  but  there  are  relatively  few  samples  around  the  true  position  of  the  robot.  Figure  5.8 
shows  the  progress  of  a  real-time  robot  localization  run  that  continues  with  the  probability  distribution  in 
Figure  5.7.  The  first  two  pictures  show  that  the  robot  localizer  is  still  performing  global  localization.  The 
last  picture  shows  that  the  robot  localizer  has  started  pose  tracking  as  the  probability  distribution  of  robot 
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Figure  5.5:  Approximating  nis  from  measurement  m  and  pose  s. 


pose  has  converged  to  a  single  hypothesis. 

We  test  the  robot  localizer  with  8  runs,  each  of  which  takes  a  different  path.  In  a  test  experiment,  it 
succeeds  to  localize  the  robot  on  5  runs  and  fails  on  3  runs.  (The  result  should  not  be  considered  statistically 
significant.)  As  a  comparison,  the  CARMEN  robot  localizer,  which  uses  particle  filters  and  is  written  in  C, 
succeeds  on  3  runs  and  fails  on  5  runs  under  the  same  condition  (100,000  samples  during  initialization,  500 
samples  during  localization,  and  36  range  readings  in  each  measurement).  Note  that  the  same  sequence  of 
sensor  readings  does  not  guarantee  the  same  result  because  of  the  probabilistic  nature  of  the  robot  localizer. 
In  the  worst  scenario,  for  example,  the  initial  probability  distribution  of  robot  pose  may  have  no  samples 
around  the  true  pose,  in  which  case  the  robot  localizer  is  unlikely  to  recover  from  errors.  Hence  it  is  difficult 
to  precisely  quantify  the  performance  of  the  robot  localizer;  the  goal  is  to  convince  that  our  implementation 
in  FTP  is  reasonably  acceptable,  not  totally  fake. 


5.3  People  tracking 

People  tracking  [50]  is  an  extension  of  robot  localization  in  that  it  estimates  not  only  the  robot  pose  but 
also  positions  of  people  (or  unmapped  objects).  As  in  robot  localization,  the  robot  takes  an  action  to  change 
its  pose.  Unlike  in  robot  localization,  however,  the  robot  categorizes  sensor  readings  in  a  measurement 
by  deciding  whether  they  correspond  with  objects  in  the  map  or  with  people.  Those  sensor  readings  that 
correspond  with  objects  in  the  map  are  used  to  update  the  robot  pose;  the  rest  of  sensor  readings  are  used  to 
update  positions  of  people. 

A  simple  approach  is  to  maintain  a  probability  distribution  Bel{s,u)  of  robot  pose  s  and  positions  u 
of  people.  Although  it  works  well  for  pose  tracking,  this  approach  is  not  a  general  solution  for  global 
localization.  The  reason  is  that  sensor  readings  from  people  are  correctly  interpreted  only  with  a  correct 
hypothesis  on  the  robot  pose,  but  during  global  localization,  there  may  be  incorrect  hypotheses  that  lead 
to  incorrect  interpretation  of  sensor  readings.  For  example,  the  two  objects  in  the  upper  right  region  in 
Figure  5.1  are  interpreted  as  a  person  only  with  a  correct  hypothesis  on  the  robot  pose.  This  means  that 
during  global  localization,  there  exists  a  dependence  between  the  robot  pose  and  positions  of  people,  which 
is  not  captured  by  Bel{s,  u). 

Hence  we  maintain  a  probability  distribution  Bel{s,  Ps{u))  of  robot  pose  s  and  probability  distribution 
Ps{u)  of  positions  u  of  people  conditioned  on  robot  pose  s?  Ps{u)  captures  the  dependence  between  the 

^Our  implementation  assumes  that  people  move  independently  of  each  other,  and  represents  Ps{u)  as  a  set  of  independent 
probability  distributions  each  of  which  keeps  track  of  the  position  of  an  individual  person. 
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Figure  5.6:  A  grid  map  and  its  likelihood  map. 


robot  pose  and  positions  of  people.  Bel{s,Ps{u))  can  be  thought  of  as  a  probability  distribution  over 
probability  distributions. 

As  in  robot  localization,  we  update  Bel{s,  Ps{u))  with  a  Bayes  filter.  The  difference  from  robot  local¬ 
ization  is  that  the  state  is  a  pair  of  s  and  Pg  (u)  and  that  the  action  model  takes  as  input  both  an  action  a  and 
a  measurement  m.  We  use  update  equations  (5.3)  and  (5.4)  in  Figure  5.9  (which  are  obtained  by  replacing 
s  by  s,  Ps{u)  and  a  by  a,  m  in  update  equations  (1.1)  and  (1.2)). 

The  action  model  .A(s,  Ps{u)\a,  m,  s',  Ps'{u'))  generates  s,  Ps{u)  from  s',  Ps'{u')  utilizing  action  a  and 
measurement  m.  We  first  generate  s  and  then  Ps{u)  according  to  equation  (5.5)  in  Figure  5.9.  We  write 
the  first  Prob  in  equation  (5.5)  as  .4robot('S|a)  s',  Ps'{u')).  The  second  Prob  in  equation  (5.5)  indicates 
that  we  generate  Ps{u)  from  Ps'{u')  utilizing  action  a  and  measurement  m,  which  is  exactly  a  situation 
where  we  can  use  another  Bayes  filter.  For  this  inner  Bayes  filter,  we  use  update  equations  (5.6)  and  (5.7) 
in  Figure  5.9.  We  write  Prob  in  equation  (5.6)  as  .4peopie(f?|a)  u',  s,  s');  we  simplify  Prob  in  equation  (5.7) 
into  Prob{m\u,  s)  because  m  does  not  depend  on  s'  if  s  is  given,  and  write  it  as  Vpeop\e{'m'\u,  s). 

Figure  5.10  shows  the  implementation  of  people  tracking  in  FTP.  and  denote  condi¬ 
tional  probabilities  .Arobot  and  A^peopie,  respectively.  m  s  returns  a  function  f{u)  =  KVpeop\e{fn\u,  s) 

for  a  constant  k.  Since  both  m  and  s  are  fixed  when  computing  f(u),  we  consider  only  fhose  range  readings 
in  m  fhaf  correspond  wifh  people.  In  implementing  update  equation  (5.4),  we  use  fhe  facl  fhaf  V{m\s,  Ps{u)) 
is  fhe  expecfafion  of  a  function  g{u)  =  Vpeop\e{fn\u,  s)  wifh  respecf  fo  Ps{u)- 

(5.8)  V{m\s,Ps{u))  =  fVpeop\e{m\u,s)Ps{u)du 

Our  implemenfafion  furlher  simplifies  fhe  models  used  in  fhe  updafe  equations.  We  use  A^robot('S|a,  s') 
instead  of  Arobot{s\a,m,  s' ,  Ps'{u'))  as  in  robof  localization.  Thai  is,  we  ignore  fhe  interaction  belween 
fhe  robof  and  people  when  generaling  new  poses  of  fhe  robof.  Similarly  we  use  A^peopie(fr|rt')  instead  of 
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Figure  5.7:  Probability  distribution  of  robot  pose  after  processing  the  first  batch  of  range  readings  in  Figure  5.1. 


-4peopie(^|a,  S) -sO  ori  the  assumption  that  positions  of  people  are  not  affeeted  by  the  robot  pose;  u  is 
obtained  by  adding  a  random  noise  to  u' .  We  also  simplify  V{m\s,Ps{u))  in  update  equation  (5.4)  into 
V{m\s),  whieh  is  eomputed  in  the  same  way  as  in  robot  loealization;  henee  equation  (5.8)  is  not  aetually 
exploited  in  our  implementation. 

Experimental  results 

We  test  the  people  traeker  on  the  same  robot  and  maehine  that  are  used  in  robot  loealization.  The  people 
traeker  uses  the  implementation  in  Figure  5.10  during  global  loealization,  but  onee  it  sueeeeds  to  loealize 
the  robot  and  starts  pose  traeking,  it  maintains  a  probability  distribution  Bel{s,u)  as  there  is  no  longer  a 
dependenee  between  the  robot  pose  and  positions  of  people.  Like  the  robot  loealizer,  we  do  not  intend  to 
quantitatively  measure  the  sueeess  rate  of  people  traeking;  rather  the  foeus  is  on  ensuring  that  our  imple¬ 
mentation  in  FTP  is  not  eompletely  useless. 

Figure  5.11  shows  the  progress  of  a  real-time  people  traeking  run  whieh  uses  the  same  sequenee  of 
sensor  readings  as  Figure  5.8.  The  first  pieture  is  taken  after  proeessing  the  first  bateh  of  range  readings 
in  Figure  5.1;  pluses  (-I-)  represent  robot  poses  and  erosses  (x)  represent  positions  of  people.  The  seeond 
pieture  shows  that  the  people  traeker  is  still  performing  global  loealization.  The  last  pieture  shows  that  the 
people  traeker  has  started  pose  traeking;  the  position  of  eaeh  person  in  sight  is  indieated  by  a  grey  dot. 
Figure  5.12  shows  range  readings  when  the  third  pieture  in  Figure  5.11  is  taken;  the  right  pieture  shows 
a  magnified  view  of  fhe  area  around  fhe  robof.  Nofe  fhaf  a  person  may  be  oeeluded  by  anofher  person  or 
objeefs  in  fhe  map,  so  grey  dofs  do  nof  always  refleef  fhe  movemenf  of  people  insfanfly.  A  refined  aefion 
model  for  people  (e.g.,  .Apeopie(w|a,  u',  s,  s')  or  one  esfimafing  nof  only  fhe  position  buf  also  fhe  veloeify  of 
eaeh  person)  would  alleviafe  fhe  problem. 


5.4  Robotic  mapping 

Robofie  mapping  [75]  is  fhe  problem  of  building  a  map  (or  a  spafial  model)  of  fhe  environmenf  from  sensor 
readings.  Sinee  measuremenfs  are  a  sequenee  of  inaeeurafe  loeal  snapshofs  of  fhe  environmenf,  a  robof 
simulfaneously  loealizes  ifself  as  if  explores  fhe  environmenf  so  fhaf  if  eorreefs  and  aligns  loeal  snapshofs  fo 
eonsfruef  a  global  map.  For  fhis  reason,  robofie  mapping  is  also  referred  fo  as  simultaneous  localization  and 
mapping  (or  SLAM). 
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Taking  a  probabilistic  approach,  we  formulate  the  robotic  mapping  problem  with  a  Bayes  filter  which 
maintains  a  probability  distribution  Bel{s,  g)  of  robot  pose  s  and  map  g.  Given  an  action  a  and  a  measure¬ 
ment  m,  we  update  Bel{s,  g)  as  follows: 

(5.9)  Bel{s,g)  ^  g\a,  s' ,  g')Bel{s' ,  g')d{s' ,  g') 

(5.10)  Bel{s,g)  ^  gVimls,  g)Bel{s,  g) 

We  assume  that  an  action  is  independent  of  the  map  and  does  not  change  the  environment;  that  is,  ^(s,  g\a,  s',  g') 
=  ^(s|a,  s')  if  <7  =  g',  and  ^(s,  g\a,  s',  g')  =  0  if  g  A  o' ■  Then  we  can  simplify  update  equation  (5.9)  as 
follows: 


(5.11)  Bel{s,g)  <—  J^,A{s\a,  s')Bel{s' ,  g)ds' 

Therefore  the  action  model  becomes  the  same  as  in  robot  localization.  We  implement  the  new  update 
equation  (5.11)  as  follows: 

let  Belnew  =  prob  sample  (s',  g)  from  Beloid  in  sample  s  from  M_4  (a,  s')  in  (s,  g) 


The  update  equation  (5.10)  is  implemented  with  a  Bayes  operation  as  before. 

Unfortunately  the  space  of  maps  has  a  huge  dimension,  which  makes  it  impossible  to  maintain  Bel{s,  g) 
without  simplifying  their  representation.  Therefore  we  usually  make  additional  assumptions  on  maps  to 
derive  a  specific  represenfafion.  For  example,  assuming  fhaf  a  map  consisfs  of  a  sef  of  landmarks  whose 
locations  are  esfimafed  wifh  Gaussian  disfribufions,  we  can  use  a  Kalman  filler  inslead  of  a  general  Bayes 
filler.  If  measuremenls,  or  local  snapshols  of  Ihe  environmenl,  are  assumed  lo  be  accurafe  relalive  fo  robol 
poses,  we  can  represenl  a  map  by  Ihe  sequence  of  robol  poses  when  Ihe  measuremenls  are  laken,  as  in  [38]. 
We  can  also  exploif  expeclalion  maximization  [14],  in  which  we  perform  hill  climbing  in  Ihe  space  of  maps 
lo  find  Ihe  mosl  likely  map.  This  approach  does  nol  mainlain  a  probability  dislribulion  over  maps  because  il 
keeps  only  one  (mosl  likely)  map  al  each  iteration. 

Here  we  assume  lhal  Ihe  environmenl  consisls  of  an  unknown  number  of  slalionary  landmarks.  Then 
Ihe  goal  is  lo  estimate  positions  of  landmarks  as  well  as  Ihe  robol  pose.  The  key  observation  is  lhal  we 
may  Ihink  of  landmarks  as  people  who  never  move  in  an  empty  environmenl.  Il  means  lhal  Ihe  problem  is 
a  special  case  of  people  fracking  and  we  can  use  all  Ihe  equations  in  Figure  5.9.  Below  we  use  subscripl 
landmark  instead  of  people  for  ihe  Sake  of  clarity. 

As  in  people  fracking,  we  mainlain  a  probability  dislribulion  Bel{s,  Ps{u))  of  robol  pose  s  and  prob¬ 
ability  dislribulion  Ps{u)  of  positions  u  of  landmarks  conditioned  on  robol  pose  s.  Since  landmarks  are 
slalionary  and  ^landmark (f(| a,  u\  s,  s')  is  non-zero  if  and  only  if  ft  =  u',  we  skip  update  equation  (5.6)  in 
implementing  update  equation  (5.3).  ^robot  in  equation  (5.5)  uses  'P|andmark(n^|n',  s)  lo  lesl  Ihe  likelihood 
of  each  new  robol  pose  s  wilh  respecl  lo  old  positions  u'  of  landmarks,  as  in  FaslSLAM  2.0  [48]: 


(5.12) 


Arobot(sla,m,s',Ps>(u')) 

=  f  Prob(sla,m,  s',u')Ps'(u')du' 

Problsja,  u')Proh{m,  s'|s,  a,  u'] 


/ 


Ps'{u')du' 


Prob{m,  s'|a,  u') 

=  Jt]" Prob{m,  s'\s,a,u')Ps'{u')du'  where  rj" 


Prob{s\a,  u') 
Prob{m,  s'|a,  u') 


=  f  rf''Prob(s'ls,  a,  u',  m)Prob{m\s,  a,  u')Ps'{u')du' 
=  f  rf"Prob(s'ls,  a)Prob(mls,  u')Ps'{u')du' 

d  “^robot('S|n) 'S  )J'^landmark(nt|'U  ,  5)^*5' (rt  )(irt' 
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Given  a  and  s' ,  we  implement  equation  (5.12)  with  a  Bayes  operation  on  ^robot('|0) 

Figure  5.13  shows  the  implementation  of  robotie  mapping  in  FTP.  Compared  with  the  implementation 
of  people  traeking  in  Figure  5.10,  it  omits  update  equation  (5.6)  and  ineorporates  equation  (5.12). 
and  denote  eonditional  probabilities  ^robot  and  "Piandmark^  respeetively,  as  in  people  traeking. 

Sinee  landmarks  are  stationary,  we  no  longer  need  .  If  we  approximate  Bel{s,Ps{u))  with  a 

single  sample  {i.e.,  with  one  most  likely  robot  pose  and  an  assoeiated  map),  update  equation  (5.4)  beeomes 
unneeessary. 

Experimental  results 

To  test  the  mapper,  we  use  a  data  set  eolleeted  with  an  outdoor  vehiele  in  Vietoria  Park,  Sydney  [1].  The 
mapper  runs  on  the  same  maehine  that  is  used  in  robot  loealization  and  people  traeking  (Pentium  III  500Mhz 
with  384  MBytes  memory).  The  data  set  is  eolleeted  while  the  vehiele  moves  approximately  323.42  meters 
(aeeording  to  the  odometry  readings)  in  128.8  seeonds.  Sinee  the  vehiele  is  driving  over  uneven  terrain,  raw 
odometry  readings  are  noisy  and  do  not  refleet  the  true  path  of  the  vehiele,  in  partieular  when  the  vehiele 
follows  a  loop. 

Figure  5.14  shows  raw  odometry  readings  in  the  data  set.  The  true  positions  of  the  vehiele  measured 
by  a  GPS  sensor  are  represented  by  erosses,  whieh  are  available  only  for  part  of  the  entire  traverse  and 
are  not  exploited  by  the  mapper.  Note  that  the  odometry  readings  eventually  diverge  from  the  true  path 
of  the  vehiele.  Figure  5.15  shows  the  result  of  the  robotie  mapping  experiment  in  whieh  we  approximate 
Bel{s,  Ps{u))  with  a  single  sample  and  use  1,000  samples  for  the  expeetation  query  and  the  Bayes  operation. 
The  eireles  represent  landmark  positions  (mean  of  their  probability  distributions).  The  mapper  sueeessfully 
eloses  the  loop,  building  a  map  of  the  landmarks  around  the  path.  The  experiment,  however,  takes  145.89 
seeonds,  whieh  is  13.26%  longer  than  it  takes  to  eolleet  the  data  set  (128.8  seeonds). 

5.5  Summary 

PTP  is  a  probabilistie  language  whieh  allows  programmers  to  eoneentrate  on  how  to  formulate  probabilistie 
eomputations  at  the  level  of  probability  distributions,  regardless  of  the  kind  of  probability  distributions 
involved.  The  three  applieations  in  roboties  substantiate  the  praetieality  of  PTP  by  illustrating  how  to  direetly 
translate  a  probabilistie  eomputation  into  eode  and  providing  experimental  results  on  real  robots. 

Our  finding  is  that  the  benefit  of  implementing  probabilistie  eomputations  in  PTP,  sueh  as  improved 
readability  and  eoneiseness  of  eode,  ean  outweigh  its  disadvantage  in  speed.  For  example,  our  robot  loealizer 
is  1307  lines  long  (826  lines  of  Objeetive  CAML/PTP  eode  for  probabilistie  eomputations  and  481  lines  of 
C  eode  for  interfaeing  with  CARMEN)  whereas  the  CARMEN  robot  loealizer,  whieh  uses  partiele  filters 
and  is  written  in  C,  is  3397  lines  long.  (Our  robot  loealizer  also  uses  the  translator  of  PTP  whieh  is  306  lines 
long:  53  lines  of  CAMEP4  eode  and  253  lines  of  Objeetive  CAME  eode.)  The  eomparison  is,  however, 
not  eonelusive  beeause  not  every  pieee  of  eode  in  CARMEN  eontributes  to  robot  loealization.  Moreover 
the  reduetion  in  eode  size  is  also  attributed  to  the  use  of  Objeetive  CAME  as  the  host  language.  Henee  the 
eomparison  should  not  be  taken  as  indieative  of  reduetion  in  eode  size  due  to  PTP  alone.  The  speed  loss  is 
also  not  signifieant.  For  example,  while  the  CARMEN  robot  loealizer  proeesses  100.0  sensor  readings,  our 
robot  loealizer  proeesses  on  average  54.6  sensor  readings  (and  nevertheless  shows  eomparable  aeeuraey). 

On  the  other  hand,  PTP  does  not  allow  programmers  to  exploit  a  partieular  representation  seheme  for 
probability  distributions,  whieh  is  inevitable  for  aehieving  high  sealability  in  some  applieations.  In  the 
robotie  mapping  problem,  for  example,  one  may  ehoose  to  approximate  the  position  of  eaeh  landmark  with  a 
Gaussian  distribution.  As  the  eost  of  representing  a  Gaussian  distribution  is  relatively  low,  the  approximation 
makes  it  possible  to  build  a  highly  sealable  mapper.  For  example,  Montemerlo  [48]  presents  a  FastSEAM 
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2.0  mapper  which  handles  maps  with  over  1,000,000  landmarks.  For  such  a  problem,  FTP  would  be  useful 
for  quickly  building  a  prototype  implementation  to  test  the  correctness  of  a  probabilistic  computation. 


Figure  5.8:  Progress  of  a  real-time  robot  localization  run.  Taken  at  20  seconds,  40  seconds,  and  80  seconds  after 
processing  the  first  batch  of  sensor  readings  in  Figure  5.1. 
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Figure  5.9:  Equations  used  in  people  tracking.  (5.3)  and  (5.4)  for  the  Bayes  filter  computing  Bel{s,  Ps{u)).  (5.5)  for 
decomposing  the  action  model.  (5.6)  and  (5.7)  for  the  inner  Bayes  filter  computing  Ps{u). 


let  Bclnew  — 
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{s,Psiu)) 
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bayes  A(s,  Ps{u)):-.  (expectation  m  s)  Ps{u))  Bel 


}  (5.4) 


Figure  5.10:  Implementation  of  people  tracking  in  FTP.  Numbers  on  the  right-hand  side  show  corresponding  equa¬ 
tions  in  Figure  5.9. 
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Figure  5.11:  Progress  of  a  real-time  people  tracking  run.  Taken  at  0  seconds,  20  seconds,  and  70  seconds  after 
processing  the  first  batch  of  sensor  readings  in  Figure  5.1. 
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Figure  5.12:  Range  readings  and  the  area  around  the  robot  during  a  people  tracking  run. 
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>  (5.3) 

}  (5.4) 


Figure  5.13:  Implementation  of  robotic  mapping  in  FTP. 
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Chapter  6 

Conclusion 


We  have  presented  a  probabilistie  language  PTP  whose  mathematieal  basis  is  sampling  funetions.  PTP  sup¬ 
ports  all  kinds  of  probability  distributions  —  diserete  distributions,  eontinuous  distributions,  and  even  those 
belonging  to  neither  group  —  without  drawing  a  syntaetie  or  semantie  distinetion.  We  have  developed  a  lin- 
guistie  framework  Ao  for  PTP  and  demonstrated  the  use  of  PTP  with  three  applieations  in  roboties.  To  the 
best  of  our  knowledge,  PTP  is  the  only  probabilistie  language  with  a  formal  semanties  that  has  been  applied 
to  real  problems  involving  eontinuous  distributions.  There  are  a  few  other  probabilistie  languages  that  are 
eapable  of  simulating  eontinuous  distributions  (by  eombining  an  infinite  number  of  diserete  distributions), 
but  they  require  a  speeial  treatment  sueh  as  the  lazy  evaluation  strategy  in  [33,  59]  and  the  limiting  proeess 
in  [24], 

PTP  does  not  support  preeise  reasoning  about  probability  distributions.  Note,  however,  that  this  is 
not  an  inherent  limitation  of  PTP  due  to  its  use  of  sampling  funetions  as  the  mathematieal  basis;  rather 
this  is  a  neeessary  feature  of  PTP  beeause  preeise  reasoning  about  probability  distributions  is  impossible  in 
general.  In  other  words,  if  PTP  supported  preeise  reasoning,  it  would  support  a  smaller  number  of  probability 
distributions  and  operations. 

The  utility  of  a  probabilistie  language  depends  on  eaeh  problem  to  whieh  it  is  applied.  PTP  is  a  good 
ehoiee  for  those  problems  in  whieh  all  kinds  of  probability  distributions  are  used  or  preeise  reasoning  is 
unneeessary.  Roboties  is  a  good  example,  sinee  all  kinds  of  probability  distributions  are  used  (even  those 
probability  distributions  similar  to  point-uniform  in  Seetion  3.2  are  used  in  modeling  laser  range  finders) 
and  also  preeise  reasoning  is  unneeessary  (sensor  readings  are  inaeeurate  at  any  rate).  On  the  other  hand, 
PTP  may  not  be  the  best  ehoiee  for  those  problems  involving  only  diserete  distributions,  sinee  its  rieh 
expressiveness  is  not  fully  exploited  and  approximate  reasoning  may  be  too  weak  for  diserete  distributions. 

Although  we  have  presented  only  an  operational  semanties  of  PTP  (whieh  suffiees  for  all  praetieal 
purposes),  a  denotational  semanties  ean  also  be  used  to  argue  that  PTP  is  a  probabilistie  language.  It  may 
also  answer  important  questions  about  PTP  sueh  as: 

•  What  is  exaetly  the  expressive  power  of  PTP? 

•  Can  we  eneode  any  probability  distribution  in  PTP? 

•  If  not,  what  kinds  of  probability  distributions  are  impossible  to  eneode  in  PTP? 

The  ehallenge  is  that  in  the  presenee  of  fixed  point  eonstruets,  measure  theory  does  not  eome  to  our  reseue 
because  of  recursive  equations.  Hence  a  domain-theoretic  structure  for  probability  distributions  should  be 
constructed  to  properly  handle  recursive  equations.  The  work  by  Jones  [30]  suggests  that  such  a  structure 
could  be  constructed  from  a  domain-theoretic  model  of  real  numbers  [17]. 
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The  development  of  FTP  is  an  effort  to  marry,  in  one  of  many  possible  ways,  two  seemingly  unrelated 
diseiplines:  programming  language  theory  and  roboties.  To  programming  language  theory,  it  eontributes  a 
new  linguistie  framework  Ao  and  another  installment  in  the  series  of  probabilistie  languages.  To  roboties, 
it  sets  a  preeedent  that  a  high  level  formulation  of  a  problem  does  not  always  have  to  be  disearded  when  it 
eomes  to  implementation.  It  remains  to  be  seen  in  what  other  ways  the  two  diseiplines  ean  be  married. 
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